Data integration is the process by which data and information from disparate and often incompatible sources is brought together by transforming the data into a single unified view, usually within a new application that offers additional reporting designed for users outside of the usual user base of the source data. Data integration also goes under the name Enterprise Information Integration (EII).
The process of data integration is complex and over time has evolved into very sophisticated tools available as standalone software from third party vendors, or as modules within programming languages that natively query the original database. Two broad methods of achieving data integration have been developed, data warehousing and mediated schema, each with their own advantages and disadvantages that should be considered when evaluating available tools.
In data warehousing, the data from multiple sources is extracted, transformed, and loaded (ETL) into a single new database which can be queried by the end user and reporting configured to their specific need. Date warehousing has the advantage that query time is reduced, but suffers from the disadvantage that data in the warehouse may be out of date, and in some applications even a few seconds might render the query irrelevant.
Mediate Schema is a more modern approach where the original source data is accessed for every query, and no single data warehouse ever exists. Data integration tools that use this approach setup multiple wrappers around the source data which are queried in real time and the results fed to a virtual database (the EDI client) where data is transformed for one time use and presented to the user. In some cases, caching may be employed to reduce query times for non time-sensitive data.
Which date integration tool to use and rollout across the organization is unfortunately not a simple decision, with each vendor having their own strengths and expertise, and at the same time numerous weaknesses. A common evaluation method involves matching source data complexity with vendor expertise, and ranking vendors according to relative ease and cost effectiveness of customizing unsupported formats and platforms.
In selecting a data integration vendor, IT administrators should be aware that the industry is undergoing a series of mergers and takeovers as vendors try to position themselves for maximum gain heading in the future. Several of the larger and more established players regularly hunt for smaller niche vendors to takeover which could result in unplanned licensing issues for organizations as well as increased costs of ownership.
The data integration tools market is being driven by customer demand more than ever before as the benefits of EII become more accepted outside of IT. Significantly, cost control is being seen as one of the reasons for the increasing acceptability of data integration tools, notably because organizations are no longer willing to pay for staff time in extracting, merging, and reporting data from multiple sources.
Tools for data integration continue to develop and the next decade promises to see the traditional market for niche products and custom solutions replaced with general tools capable of interfacing with almost every data format.