Data Integration refers to the process of standardizing, conforming, and merging data from various sources to provide a unified view of the data. Data Integration is the heart and the most critical piece in any Information Management solution. It is also usually the most complex, most expensive, and longest process of a project.
Data Integration process usually goes through the following steps:
- Extraction and staging – acquiring and moving data from source systems to a staging area
- Validation – validating that the data extracted is valid and complete
- Cleansing – interrogating the data for quality issues and applying business rules to cleanse the data
- Conforming – translating and transforming date to make data from various sources look uniform
- Surrogate key generation – generating the target primary keys based on natural keys from the source
- Change data capture (CDC) – identifying that data that has changed since the last load
- Loading – populating the target application
- Audits and controls – ensuring that all data has been processed correctly and completely
- Exceptions and errors – identifying and publishing data issues
Typically, it is done with the use of Extract Transform Load (ETL) tools although increasingly various other approaches including ELT, SOA using web services are also being used. It is primarily used to acquire, massage, and populate data into a data warehouse or a data mart.