|
|
 |
Data
Transformation
Based on our experience, data poses one
of the most significant obstacles to the success of a technology
project. The importance of data activities such as mapping, extraction,
transformation, and loading is often grossly underestimated. We
recommend the following:
-
Gather Requirements: It is essential to identify
all data elements as well as their interrelationships early
in the project lifecycle.
-
Map Data Elements: In our opinion, the importance
of data mapping cannot be overstressed. An accurate and thorough
data mapping document provides a foundation for all downstream
data activities. Also, the mapping document becomes an invaluable
project artifact for GUI development, interface development,
reporting development, and testing.
-
Identify Data Sources: Data sources often
impose constraints on resources and timelines as other organizations
are brought into the sphere of the project. Identifying these
data sources early in the process provides greater lead-time
for extracting, transforming, and loading the data.
-
Extract Data: Data extraction involves designing
data formats, determining transport protocols, identifying
staging areas, and creating record validation routines.
-
Transform Data: Transformation can often
be the most complex data activity. While we feel this complexity
can be lessened by an accurate data mapping document, we agree
that this activity presents unique challenges. Transformation
involves the following: de-duplication, formatting, cleansing,
truncating, data type alteration, etc. These activities are
frequently accomplished through a combination of programmatic
solutions and manual efforts.
-
Load Data: Data loading involves creating
data validation routines, designing exception reports, identifying
staging areas, and tuning load performance characteristics.
Performance tuning is primarily the focus in this data activity.
Below are a breakdown of the above activities and their relative
level of effort.
| Activity |
Level of Effort |
| 1. Requirements Gathering |
15% of the overall data effort |
| 2. Data Mapping |
30% of the overall data effort |
| 3. Data Sources |
10% of the overall data effort |
| 4. Extraction |
10% of the overall data effort |
| 5. Transformation |
25% of the overall data effort |
| 6. Loading |
10% of the overall data effort |
In
summary, we feel that an understanding of the importance and
complexity of data activities early in the project lifecycle
pays huge dividends in the success and eventual acceptance of
the application.
|
 |