What is Data Integration?

Data integration involves taking data from two or more databases, sometimes within the same company, and producing meaningful reports from the merged data. Data integration can be thought of as one of the holy grails of IT, and at the same time a major headache for most IT professionals.

The vast majority of databases used in corporations are proprietary in nature, designed by the supplier to suit the needs of their querying software, not for integration with other data systems, leading to difficulties creating a report that combines two sets of data, for example sales data from a spreadsheet or customer relationship database, with banking reconciliation data from the accounts software.

On the face of things the example above sounds simple enough that manually comparing the two disparate reports would give the complete picture, but the example is too simplistic, there could be many other reports that are too complicated for a quick scan of the separate results, and this is where data integration becomes the solution.

What is Data Integration? digital globe

IT specialists with database programming skills are required, who write specific applications that allow for querying multiple source data sets, often this will be a patch to existing software, but in many cases could be a completely separate piece of software tailor written for the end user.

More often these days IT departments are requiring compatibility between databases before implementing new software across the corporation, adding to the license expenses and requiring a long term commitment to maintaining the custom software thru version changes for all the source databases.

In the online world data integration is being driven by the open source community who have developed OpenID allowing members of social networks to login to many proprietary networks using a single login and password, and carry their personal data to that new network.

Computer software that does use data integration to produce new and unique reports from two or more databases are the type used in keyword analysis in the search engine optimization market where specialist software or websites query databases from the major online advertising agencies.

The corporate world is no different, some of the biggest Fortune 500 companies were created thru mergers and acquisitions of companies using different software and databases, and data integration is essential to producing accurate and reliable financial reports.

One of the most popular methods of making date integration happen with large databases is to create a new database from running queries on the original databases, and then writing specialist front-end software for the new database which queries only the new database to produce the reports needed.

This technique is known as data warehousing and can be a simpler and more cost effective approach because regular exports from the original database can be scheduled, and then imported into the warehouse. With proprietary software this approach reduces licensing issues and possible corruption of databases.

Data Integration Tools

Data integration is the process by which data and information from disparate and often incompatible  sources is brought together by transforming the data into a single unified view, usually within a new application that offers additional reporting designed for users outside of the usual user base of the source data. Data integration also goes under the name Enterprise Information Integration (EII).

The process of data integration is complex and over time has evolved into very sophisticated tools available as standalone software from third party vendors, or as modules within programming languages that natively query the original database. Two broad methods of achieving data integration have been developed, data warehousing and mediated schema, each with their own advantages and disadvantages that should be considered when evaluating available tools.

In data warehousing, the data from multiple sources is extracted, transformed, and loaded (ETL) into a single new database which can be queried by the end user and reporting configured to their specific need. Date warehousing has the advantage that query time is reduced, but suffers from the disadvantage that data in the warehouse may be out of date, and in some applications even a few seconds might render the query irrelevant.

Data Integration Tools Data Integration

Mediate Schema is a more modern approach where the original source data is accessed for every query, and no single data warehouse ever exists. Data integration tools that use this approach setup multiple wrappers around the source data which are queried in real time and the results fed to a virtual database (the EDI client) where data is transformed for one time use and presented to the user. In some cases, caching may be employed to reduce query times for non time-sensitive data.

Which date integration tool to use and rollout across the organization is unfortunately not a simple decision, with each vendor having their own strengths and expertise, and at the same time numerous weaknesses. A common evaluation method involves matching source data complexity with vendor expertise, and ranking vendors according to relative ease and cost effectiveness of customizing unsupported formats and platforms.

In selecting a data integration vendor, IT administrators should be aware that the industry is undergoing a series of mergers and takeovers as vendors try to position themselves for maximum gain heading in the future. Several of the larger and more established players regularly hunt for smaller niche vendors to takeover which could result in unplanned licensing issues for organizations as well as increased costs of ownership.

The data integration tools market is being driven by customer demand more than ever before as the benefits of EII become more accepted outside of IT. Significantly, cost control is being seen as one of the reasons for the increasing acceptability of data integration tools, notably because organizations are no longer willing to pay for staff time in extracting, merging, and reporting data from multiple sources.

Tools for data integration continue to develop and the next decade promises  to see the traditional market for niche products and custom solutions replaced with general tools capable of interfacing with almost every data format.