Our MDM Strategy Offerings
Recently, I put together an overview of Hub Designs’ MDM strategy offerings for a potential client. Here’s a recap.
Education
- Based on our popular “Best Practices in MDM and Data Governance” speaking engagements, presented at Oracle OpenWorld and the Oracle Applications Users Group COLLABORATE conference.
- Our workshops get business & IT professionals up to speed quickly
- You get access to the best MDM experts, and can bring your business people into the process early
Roadmap
- Based on Hub Designs’ MDM framework
- Defines where you are now, where you want to be, and over what time period
- Looks at master data management, data integration, data quality, and data governance over time
Readiness Assessment
- Looks at issues relating to politics & culture
- Performs skills assessment on people who may need training
- Examines process issues, outlining where business processes need improvement or redesign
- Investigates technology issues, detailing where essential components are not present or not able to support your upcoming MDM initiative
- Performs data profiling to discover data quality issues
Business Case
- Captures business requirements
- Identifies stakeholders and select metrics
- Baselines current performance
- Negotiates expected benefits
- Converts to financial results
- Develops total cost of ownership
- Calculates hard-dollar ROI
Software Selection
- Develops selection criteria
- Creates a weighted vendor scoring model
- Includes functionality, technology, viability, costs, services and vision
- Develops demo scripts for vendors to follow and sample data sets to give them
- Manages proof of concept (POC) process
- Assists in evaluating POC performance and scoring vendors
These engagements range in length from one to twelve months, with teams varying from two to ten people, depending on the size of the company, the number of domains of master data involved, and the complexity of the politics and legacy systems in the enterprise.
If you’re interested in discussing an MDM strategy engagement like this, please contact Hub Designs at http://www.hubdesigns.com/contact_us.html. Or if you have comments on the above approaches, please let us know by commenting here.
A friend of Hub Designs is trying to fill the following MDM position. If you’re interested, information on how to contact the hiring manager is at the end of this article.
Expected Activities to be Performed
Technical Program Manager (PM) with strong communication & project mgmt skills. Ideally has a development background, but has already transitioned to PM role. Has experience on large transactional systems, is familiar with complex integrations using Web Services and Pub/Sub integration patterns, and is familiar with scale-up/out strategies and complexities. Someone who can outline strategy, plan and functional design, have enough project mgmt skills to drive those plans to closure, and have the experience to know what their risks are and how to navigate obstacle and politics.
Tasks:
- Author Scenarios and Uses Cases
- Author detailed Functional Specifications
- Manage one or more feature teams (cross-functional team of Dev and QA as related to implementing feature areas) with necessary communications and managing weekly feature team meetings
- Work with Dev, Test, Architects, program leadership and other PM’s to successfully design, vet, and receive approval for specified features
- Generating weekly status
Feature Areas for possible focus:
- Reporting design (defining necessary design to accomplish business/audit and operational reports from a very large ODS database, which may require defining necessary data marts and/or cubes
- Business rules for computing data quality scores on key data fields
- Disaster recovery design (extending existing DR that is already in place)
- Data retention design
| Technical / Soft Skills Required | Expertise Level (Expert / Good / Familiarity) | Remarks |
| C# | Familiarity | Not used directly. |
| SQL | Good | Limited use, but may need to query for analysis, etc. |
| XML | Good | Contracts and other aspects often use XML in definitions. |
| .Net Framework 3.5 and 4.0 | Familiarity | Not used directly |
| Web/WCF Services | Good | From design standpoint. Should have knowledge on WS concepts. |
| ASP.Net | Familiarity | |
| Data Structures | Expert | |
| Function autonomously | Expert | |
| Communication Skills (Written and Oral) | Expert | |
| VSTS | Good | We use VSTS heavily for managing scenarios, use cases, requirements and for all bugs/tasks/issues. |
| Unit Testing & Test Cases writing | Good | Must understand how this works, and help identify test cases needing to be written by QA. |
| Write and present High Level SOW for Features | Expert | These 3-7 page documents outline the requirements and business context for a feature area. |
| Write, present and manage Functional Specification Design (FSD) | Expert | Detailed functional specification (40-80 page). |
| Coordination skills – working in a highly collaborative environment | Expert | Work with Dev, Test, fellow PMs and Architects. |
| Leadership skills | Good | Ability to influence and sell designs that meet our immediate needs and our long term platform aspirations. |
| Experience designing high Performance and high scale solutions. | Good |
Desirable Skills
| Technical / Soft Skills | Expertise Level (Expert / Good/ Familiarity) | Remarks |
| Matching Technologies | Knowledge | Working with vendor or custom built matching engines. |
| MDM experience | Knowledge | Any prior experience on an MDM team, this can be useful. |
Duration
Expected Start Date = August 2010, Expected End Date = TBD (likely a year out)
Principals only please (no agencies). Again, please contact Dan Power via http://www.hubdesigns.com/contact_us.html, and we’ll put you in touch with the hiring manager.
What I Learned on My Summer Vacation
I recently got back from my summer vacation, a 16-day, 300-mile sailing trip with my wife and two boys. We co-organized the trip for 15 boats, all members of the Blue Water Sailing Club. We went from the Boston area, through the Cape Cod Canal, down Buzzards Bay to Rhode Island Sound, spending four days on Block Island, and stopping off at great places like Padanaram, Cuttyhunk and Newport along the way.
Continuing a tradition we started in 2008 with an article called “Lessons on MDM from My Summer Vacation“, I’ll try to sum up some things I learned along the way, and apply them to master data management and data governance where I can.
1. Be Prepared for Storms
On one passage from Red Brook Harbor to Cuttyhunk, we were hit with a nasty thunderstorm that wasn’t forecast to go through until much later in the day. Winds were clocked at 50 knots (58 miles per hour). We prepared by dousing our sails, getting our foul weather gear on, battening down the hatches, and getting the boys in safe positions down below. But when the storm hit, the rain on my face felt like needles, visibility dropped to zero, our dinghy flipped over on its towing bridle, and I had to concentrate on avoiding a buoy in the area.
The application to MDM is that, given how political these projects can be, there will be storms. So be prepared for them. Have a good crew (project team), work hard at instilling loyalty between the team members, and maintain a united front. In our case, the storm, though intense, passed quickly, and we were able to get our dinghy right side up and resume our course for Padanaram with no injuries or damage.
2. Don’t Try to Control Too Much
Co-organizing a sailing trip with 15 boats can be a bit like herding cats. Sailors are very independent by nature at best, and even though we had regular check-ins by radio, some people would skip them completely, and others would forget (including me). Traveling with two young children increased the chaos factor. We’ve learned to go with it a bit – it’s like riding a wave. You can’t plan every minute of every day – sometimes you’ve got to be spontaneous, put the plan aside and just see what happens.
In the MDM and data governance world, the business community as a whole, even though they may not be on your project team directly, is going to be directly affected. They’ll want to have a say in how things are done, and they’ll have good ideas for you. Don’t shut them down. Learn to listen, actually consider what they’ve got to say, and be inclusive. Have town hall meetings where the broader business community gets a chance to tell you about their concerns, where you communicate the project’s progress and milestones, and where you can reach out to them and pull them in to upcoming phases.
3. Accept the Kindness of Others
Previously, we had a 32 foot boat, but at the beginning of June, we took delivery of a 38 foot boat, which we were still getting the hang of. A couple of club members on the cruise took the time to help us get to know the systems on our new boat, and it was great to have experienced friends walking us through what was, to me, new territory. Whether it was the selector switch between water tanks, the fresh water pressure pump, the anchor wash down pump, or various other things, our friends took the time to mentor us on the ins and outs of our new boat. And on the last day of the cruise, our friend Fred remembered that our son Brendan wanted a ride in his skiff, so he came alongside as we were leaving the harbor, picked him up, and gave him the ride of a lifetime.
The application to MDM and data governance is that you should be open to mentoring within and outside the enterprise. People like sharing their experience and wisdom with others, once you’ve established a strong relationship. If you reach out and develop a network of contacts inside and outside the company, then when the stuff hits the fan, you’ll be able to call on them for help. And even when you don’t need help, you’ll find a ready group of mentors who’ll take you under their wing, to teach you the finer points of leadership skills, project management tips and tricks, communications and marketing excellence, business process redesign and organizational change management basics — all the things you’ll need to succeed in your MDM and data governance initiative.
4. Stay on Schedule
There were several times during our sailing trip when we were tempted to stay an extra day or leave a day early from one place or another. We talked it over as a group and decided to stay on schedule. Many of us had made mooring reservations at marinas with strict cancellation policies, and we would have ended up paying for those moorings even though we didn’t use them. Not a big deal in and of itself, but we asked ourselves, what’s the worst that could happen if we stuck with the original schedule? It turned out that it wasn’t that different from what would happen if we went with a changed schedule.
In the MDM and data governance world, as in any technology implementation, there are going to be unforeseen obstacles. Try to build some cushion into your project plan, so the smallest little delay doesn’t impact your critical path and delay the overall project. When you get to the point that to stay on schedule means sacrificing functionality or increasing costs (the famous “triple constraint“), the discussions start getting pretty heated. There will be many times when your project will feel like you’re herding cats too, but remember how important it is to stay on schedule. You can’t finish on time if you get behind shortly after you start.
5. Look for Those Special Moments You’ll Always Remember
There were quite a few special moments on this vacation. Shortly after we arrived in Cuttyhunk, both of my boys put on their bathing suits and dove off the boat into the harbor. They swam fearlessly from Blue Water boat to Blue Water boat, saying hello to their friends, until we had a bunch of kids in the water doing the same thing, including one little girl that had never done that before (and who made her dad very proud). That night, after going to the beach, we had a lobster bake that I organized for 33 people on the lawn overlooking the harbor. I will remember the conviviality and friendship of that dinner for a long time. And there were small moments too: body surfing with my youngest son in Westport, getting airborne in the dinghy, slogging through the passage to Block Island against 25 knot winds, foul currents and 4-6 foot seas (even the hard times can be good memories after you get through them).
For MDM and data governance practitioners, there are many rewards: the satisfaction of bringing in a challenging project on time and on budget, forging relationships with team members that will last a lifetime, learning new things and expanding professional horizons, being recognized by the company as a valuable player capable of big things, mastering MDM and data governance at a time when having those technologies on one’s resume certainly doesn’t hurt one’s career prospects, and so on. For a good look at what is involved in being a “data champion”, and the rewards involved, read “So You Want to be a Data Champion?” by my friend, Tom Carlock.
To sum it up, if you’re prepared for the inevitable storms that will come your way and don’t try to control things too much, and are open to the kindness of others while remembering the importance of staying on schedule, you’ll certainly be blessed, as I have been, with a wealth of those special moments you’ll always remember. Master data management and data governance can be challenging, but they can be very rewarding as well, both for the organizations which take on the initiatives and for the individuals who make up those teams.
MDM Community on Ning
Today, Hub Designs committed to sponsoring the MDM Community on Ning.
Recently, Ning changed its business model from providing free social networks to charging between $19.95 per year (for educational and non-profit use) to $199.95 per year (for customized Ning Networks), all the way up to $499.95 per year (for high end social networks with integration options and more bandwidth and storage).
When I started the MDM Community back in November 2008, it was mostly a reaction to the awful state of LinkedIn Groups. Lots of spam, tons of irrelevant job postings, and very little community or sharing between MDM practitioners.
At the time, Ning was a free option, so starting the MDM Community on Ning was an easy choice. It grew gradually, and now has 295 members from 28 different countries. A lot of the different players in the MDM world are represented: Oracle (and Silver Creek Systems), IBM (and Initiate Systems), Informatica (and Siperian), D&B, Kalido, Orchestra Networks, Riversand Technologies, TIBCO (and Netrics). And a lot of large systems integration and consulting firms are represented.
Well, Ning is no longer providing its social network as a free service, but the $200 per year is a pretty reasonable investment to give MDM practitioners all over the world a vendor-neutral forum to hang out, ask questions of one another, help each other out, provide assistance, share opinions, write blog articles, update their profiles, do all of those things that people do on social networks.
At the time, there really wasn’t any other place to do all that which was ad-free, spanned all of the different flavors and vendors of MDM and data governance, and gave everyone an equal voice. I moderate the discussion forums but I try to do it with a very light hand. If anything, perhaps I should be more involved in the MDM Community and put more of my energy into growing it – and hopefully, I’ll do that now that Hub Designs has stepped up to keeping it alive on Ning.
If you haven’t already joined, please consider joining by clicking here. If you’re already a member, log in at http://mdmcommunity.ning.com/ and let us know what’s on your mind.
Multiple Siperian Openings
A specialized professional services firm is looking for several experienced people for the following positions:
- Responsible for configuration of Siperian Hub, BDD and HM
- Develop logical and physical data model
- Develop Siperian design specifications
- Configure Siperian MRM/HM/BDD to meet requirements
If you’re interested, please contact Dan Power at www.hubdesigns.com/contact_us.html, and we’ll forward your message to the appropriate person.
The Hub Designs Blog welcomes the final installment of this great series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.
Part 5: The Profiling Payoff
This is the final part of a five-part series, describing how data profiling benefits both IT projects and business operations. In Part One, we discussed profiling perspectives. In Parts Two, Three and Four, we introduced the value of system, entity, and attribute-level metrics. This part discusses the archival and beneficial uses of profile results.
If you have defined your corporate data profiling strategy similar to the methods discussed in the preceding parts of this series, you’ll have amassed a robust collection of metadata spanning relevant systems across your business. Although systems may be of different types and locations, the structured approach and common metrics you collected create a centralized repository of information that can be examined holistically. Ideally, this information will exist in an open-source database repository with reports made available across the enterprise. System and Entity information help planners and developers organize information strategies. Attribute-level domains, constraints, and business rules help data architects understand existing systems. Relationships and value patterns are readily available to support validation of information-related hypotheses as needed.
If you plan to design your own repository, consider adding timestamps and indicators to help you manage and present the information. To keep your repository relevant to business needs, design collection rules to be configurable. This allows you to easily ignore superfluous information or enable tests only at certain critical times. Allow initial system profiling efforts to gather a large set of metrics and store them as your baseline. As you learn about the information, you will see which tests or which data objects add no value. Us geeky DBA-types who understand system-level catalogs have our own scripts to do much of what was described inParts Two,Three and Four. Those less-inclined may prefer to use a third-party tool for profiling. Either way works as long as the business needs are satisfied and the entire enterprise standardizes on one approach (and thus one integrated repository).
You will find that collecting and maintaining this level of detail has a definite cost. Even if the collection is automated, interrogations of large data sets places an overhead on production systems that may not be practical. Record and monitor profile execution metrics to identify bottlenecks or tuning opportunities. Realize that the extent of data profiling is contingent on the project phase, specific data elements, and most of all, business value. Review profiling goals on a regular basis and eliminate unnecessary and redundant checks.
How much profile history to maintain is another consideration. Even though disk is “relatively” cheap, maintaining all historical entries in a live repository may not be necessary. Consider business needs and value for historical profile information. Even consider archiving at a summarized (or less frequent) level and keep only a limited time window of statistics online.
This discussion on data profiling was intended to broaden perceptions of what it means to a business and the value it can bring if done in a sustainable way. The blog format is not conducive to in-depth discussions, but hopefully the topics covered here spur some thoughts into how you can add value to your business by implementing some of these concepts. Use your imagination, but remember that no matter how cool it might be to collect and store some profile output, if it does not add business value to somebody, it might not be worth the overhead to continue recording it.
The Hub Designs Blog welcomes Part 4 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.
Part 4: Profiling Relationships and Patterns
This is part four of a five-part series describing how data profiling assists in all aspects of system development, from design through deployment.
Part One introduced different perspectives on data profiling. Part Two identified valuable system and entity metrics to track. Part Three discussed attributes. In this segment, we dive deeper into attribute relationships and pattern recognition. Also, we expand on primary key identification discussion and discuss hidden relationships.
Pattern grouping provides a mask of distinct format patterns within an attribute data set and a count of the number of occurrences. Patterns give insight into the type of values found in an attribute. For example, a numeric pattern analysis may show values such as 999.99999, 99, or -.9999.
Observing distinct patterns gives insight into the maximum digits and precision, and also domains such as integer or real. Pattern of a database date or date-time type provides unremarkably similar patterns for all dates. Because the database management system typically enforces the domain, date analysis provides no value and can be ignored. If dates are stored in character format, however, patterns quickly show variations in date formatting. Character patterns only have significance to a limited number of positions. It makes no sense to pattern a description field of 200 or 2000 characters. Smaller code attributes of less than 10 characters though do provide value. Ignore pattern profiling for character strings over 20 characters at first, then refine to shorter character strings if the results do not add value.
In pure database theory, referential integrity (RI) is your friend. In practice, designers and software vendors often forgo RI to improve system performance on data inserts. These designers place the data quality burden on the application and do not endorse external data manipulation outside the application interfaces. In the real world, though, data corruption occurs and without RI or routine data quality checks, corruptions may not be found for a long time or not at all. Personally, I have identified over $50,000 of recent orphaned sales in a retail client resulting from deliberately disabled RI. These unreported sales were not added to the ledger and were allowed to occur for performance reasons until I found them through simple profiling. Enforcement of RI is a topic for another discussion but is mentioned here because it does identify a valid reason for data profiling.
In even presumably good relational designs, some parent-child relationships are not enforced for different reasons. First, interrogate the RI listed in the system catalogs to identify all enforced relationships. Reverse-engineering a system with a good modeling tool is probably the best way to do this. A harder and more valuable analysis is to identify unenforced relationships and determining the probability of the relationship if not all values are an exact match. Do this by counting all the candidate child attribute values that exist within a known parent attribute table. If all match and there are a non-trivial number of matches, there is a good probability of a non-identified relationship. A small number of mismatches could identify data quality issues.
In Part 5, we tie all the techniques discussed in the first four parts together to show the value of a repeatable data profiling process.
The Hub Designs Blog welcomes Part 3 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.
Part 3: Attribute-Level Analyses
This is part three of a five-part series on data profiling.
In Part One, we took a light-hearted view of where profiling benefits an organization and in Part Two, we discussed the fundamentals of a profiling strategy. The remaining three parts discuss attributes, relationships, patterns, and how to use the combined data profiling information you collect. In this section, we introduce attributes, the lowest-level components of a profiling effort.
An attribute is simply a individual data element. Alone, an attribute has no context. Given the simple descriptor of “Cost” for an attribute tells us very little about the attribute’s true purpose and immediately drives a need for additional information, such as units (hours, Dollars, Euros…), type (weighted, unit, gross…), and use (invoice, sum, average…). Attributes therefore must be analyzed within the context of their business purpose to have meaning.
Some characteristics require business knowledge to define and others can be determined through interrogation of existing values and underlying rules of the storage medium. It takes both analyses to get a complete picture of information within a system. While assembling this puzzle, though, keep in mind that until you validate the enforcement of business rules, only assumptions can result from physical profiling or business context.
Analyses of values, domains, and constraints allows insight into use (or abuse) of an attribute. The larger the sample size, the better confidence you gain in the results. Without explicit proof of business rule enforcement, though, you must assume that just because a value does not presently exist does not mean it cannot exist. Business rules are defined by business experts and enforced through database constraints, data type/precision, and application code. Knowing the methods of enforcement allow you to narrow a domain but not totally understand it. Profiling of actual values provides additional refinement in terms of percentage of NULL values, percentage of distinct values, minimum, maximum, and average values, top x and bottom x recurring values along with their counts, and minimum, maximum, and average data lengths.
Some attributes within a data set serve valuable purposes that are important to identify. Attributes that individually or in conjunction with others define uniqueness of the data set also may support relationships between entities. Uniqueness can be further classified as being either members of a system-enforced primary key or of a business key (outside of the defined primary key). System-enforced primary keys are relatively easy to define within a database system through interrogation of the system catalog. Business keys that exist in tables in addition to a primary key may be more difficult to identify, especially if more than one attribute is needed to define uniqueness.
Attribute-level information of interest includes: data type (size and precision), the number and percent of NULL values, column descriptions, number and percent of distinct values, and the minimum-maximum-average values and lengths. Uses of the system catalog provides some of this information, but others must be collected through sampling the data.
Other types of attributes that may help in identifying relevancy are those that provide system-level auditing or change control. Knowing which attributes fill these roles may either allow you to (a) ignore them for profiling purposes or (b) use them to help explain versions or data anomalies.
Part 4 expands on attribute profiling with the introduction of relationships and patterns.
The Hub Designs Blog welcomes Part 2 of this series by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.
Part 2: Profiling the Basics
This discussion is the second of a five-part series on data profiling. In Part 1, we discussed the project roles that benefit from data profiling and how better understanding information results in more reliable information systems. Important goals of any profiling strategy include automation of metric collection and socializing results to support the differing objectives of a data-centric project.
Early in a system development life cycle, profiling helps define sources, data storage requirements, and data transformations. As a system goes into production (or if profiling is added to an existing system for quality control purposes), routine profiling is useful to audit system quality and business rule enforcement. The frequency of collection and amount of effort you expend to automate your profiling methods should be based on the ability of the organization to benefit from the profile results.
This section discusses the beginnings of a profiling effort. Information assembled here forms the foundation of other profiling activities. For this discussion, consider a Profile Group as a set of information sharing a common purpose and data management methods. Examples of profile groups include tables within a single database schema or a group of spreadsheets with the same format but each spreadsheet representing a different time slice of data.
The underlying System managing a set of information within the profile group may be a named relational database, a file system directory, or even a web site being accessed through web services. The reason we abstract information into Systems is to group the information into distinct governance methods common to the underlying information. Relevant metadata and governance methods we track at the system-level include: technical contacts, backup schedules, system descriptors, connection strings, business unit owners, and host operating systems. System-level metadata common to a profile group helps us understand and troubleshoot future analyses. This level of information also provides developers with an understanding of inherent restrictions (or freedoms) they may encounter when trying to use or integrate the information.
Entities within a profile group belong to the same system, may have a common unique identifier, and, for database entities, have the same schema owner. Typically, entities are database tables, but may also be similar files or spreadsheet tabs containing like attribute lists. For entities, we track characteristics common to all the attributes they contain. These include: row counts, entity-level descriptors, growth characteristics (size and frequency), last analyzed date, and various customized indicators such as active/inactive, existence of change data management attributes such as insert/update timestamps, and existence of audit traceability indicators such as insert/update username.
The combination of system and entity level profiling supply the foundation for the attribute-level profiling, which is where physical information in a system resides. It also provides valuable metadata to classify information and allows for future correlation of like information across systems. Assembly and publication of entity and system level information benefits the various consumers of the information by providing a centralized “master” source of contact and context information.
In Part 3, we will dive into the attribute level analyses around data profiling.
The Hub Designs Blog welcomes a guest post by Rob DuMoulin, an information architect with more than 26 years of IT experience, specializing in master data management, database administration and design, and business intelligence.
Part 1: The Psychology of Data Profiling
Swiss psychologist Carl Gustav Jung founded the Analytical School of Psychology. His word association theories form the basis of the Myers-Briggs Type Indicator Assessment test to identify career aptitude in today’s high school students. Dr. Jung’s approach assigned personality profiles based on how an individual’s thoughts associated to various phrases. By analyzing responses, he could understand how an individual viewed the world around them and perceived themselves. Typically, subjects are asked to speak the first thought entering their minds after hearing a trigger phrase. For the following example, remember, there are no wrong answers. If I say the words “Data Profiling”, what is the first thing you think of?
If you thought of food, cats, country music, CSI NY, or residential plumbing, you are either not in IT or are an IT Manager.
If your first thought was “Quality Assurance”, you align yourself with data quality professionals having anti-social thoughts of failing test cases and sadistically reporting lazy developers for buggy code. You gleefully scour test cases looking for any evidence of truncation, missing values, non-matching codes, numeric precision errors, and inconsistent abbreviation, text, and date formatting.
If “Integration” comes first in your mind, past legacy integration projects have scarred you with a disdain for source system data quality levels. You view production apps with contempt and loathe the time it takes to track down data issues caused by system integrations. You investigate upstream sources to create detailed mappings and transformation rules. Typical debugging sessions consist of validating relationships to identify orphaned data, identifying attributes that contain overloaded columns (attributes containing more than one distinct data element), or fixing format errors from implied decimals.
Some of you responded with “Value Domains” or “Data Types”, indicating you are obsessive compulsive data architects compelled to organize the world into strict and orderly fashion with some degree of normalization, though you are not considered “normal” by your peers. Your concerns lie in understanding and regulating naming conventions, relationships, existence of NULL or default values, and understanding the meaning of each data element to accurately identify business rules and when two or more objects are related or redundant.
Lastly, if “Debugging” is the first item in your thought queue, you are a coder justifying why presumably good code is not working. Extreme paranoia has taught you to assume nothing about data quality, so you add tests to identify duplicates, validate relationships, enforce business rules, track change data capture, provide substitute values. Your phobia of early morning phone calls cause you to add auditing to your code to inform a DBA of data issues rather than waking you up in the middle of the night.
It is truly amazing how much we can conclude from the response to one simple phrase.
As stated before, there are no wrong answers. Aside from the innocent jab at Managers and non-IT resources, we all realize the benefits of information quality and absolutely need business involvement to understand context and domains of business information. The meaning and actions of Data Profiling change both by role and by project phase. Through profiling, we are able to identify best sources of information, learn proper ways to categorize and store it, reactively identify quality issues, and proactively define business rules to prevent future issues.
Identifying what is important to profile, when and how profiling is done, and how to share our findings across business and project resources is key. Done properly, profile results integrate to a master metadata repository and are periodically refreshed through an automated process.
This five-part series provides a tool-agnostic approach to comprehensive data profiling, focusing on information meaning and use. The next part of the series discusses system and table-level profiling. In particular, what information is important to collect at the system and table level and how can that information be leveraged by the Enterprise to help assure quality. The third part dives into attribute-level profiling and the fourth discusses attribute patterns and relationships. The final part discusses the benefits and utility of gathering profiled information into a single repository.
Modeling the MDM Blueprint – Part 6
In this series, we’ve discussed developing the MDM blueprint by developing the Common Information (Part 2), Canonical (Part 3) , and Operating (Part 4) models in our work. Part 5 introduced the Reference Architecture model into the mix to apply the technical infrastructure or patterns we plan on using.
The blueprint has now moved from being computation and platform independent to one that expresses intent through the use of more concrete platform-specific models. The solution specification is now documented (independent of the functional Business Requirements) to provide shared insight into the overall design.
Now, it’s time to bring the modeling products together and incorporate them into a MDM solution specification we can use in many ways to communicate the intent of the project.
First, the MDM blueprint specification becomes the vehicle for communicating the system’s design to interested stakeholders at each stage of its evolution. The blueprint can be used by:
- Downstream designers and implementers to provide overall policy and design guidance. This establishes inviolable constraints (and a certain amount of freedom) on downstream development activities.
- Testers and integrators to dictate the correct black-box behavior of the pieces that must fit together.
- Technical managers as the basis for forming development teams corresponding to the work assignments identified.
- Project managers as the basis for a work breakdown structure, planning, allocation of project resources, and tracking of progress by the various teams.
- Designers of other systems with which this one must interoperate to define the set of operations provided and required, and the protocols for their operation, that allows the inter-operation to take place.
Second, the MDM blueprint specification provides a basis for performing up-front analysis to validate (or uncover deficiencies in) design decisions and refine or alter those decisions where necessary. The blueprint could be used by:
- Architects and requirements engineers who represent the customer. The MDM blueprint specification becomes the forum for negotiating and making trade-offs among competing requirements.
- Architects and component designers as a vehicle for arbitrating resource contention and establishing performance and other kinds of run-time resource consumption budgets.
- Development using vendor-provided products from the commercial marketplace to establish the possibilities for commercial off-the-shelf (COTS) component integration by setting system and component boundaries and establishing requirements for the required behavior and quality properties of those components.
- Architects to evaluate the ability of the design to meet the system’s quality objectives. The MDM blueprint specification serves as the input for architectural evaluation methods such as the Software Architecture Analysis Method [and the Architecture Tradeoff Analysis Method (ATAM-SM) and Software Performance Engineering (SPE) as well as less ambitious (and less effective) activities such as unfocused design walkthroughs.
- Performance engineers as the formal model that drives analytical tools such as rate schedulers, simulations, and simulation generators.
- Development product line managers to determine whether a potential new member of a product family is in or out of scope, and if out, by how much.
Third, the MDM blueprint becomes the first artifact used to achieve system understanding for:
- Technical managers, as the basis for conformance checking, for assurance that implementations have in fact been faithful to the architectural prescriptions.
- Maintainers, as a starting point for maintenance activities, revealing the areas a prospective change will affect.
- New project members, as the first artifact for familiarization with a system’s design.
- New architects, as the artifacts that (if properly documented) preserve and capture the previous incumbent’s knowledge and rationale.
- Re-engineers, as the first artifact recovered from a program understanding activity or (in the event that the architecture is known or has already been recovered) the artifact that drives program understanding activities at the appropriate level of component granularity.
Blueprint for MDM - Where this fits within a larger program
Developing and refining the MDM blueprint is typically associated with larger programs or strategic initiatives. In this last part of the series, I'll discuss where all this typically fits within a larger program and how to organize and plan this work within context.
The following diagram (click to enlarge and use your browser to magnify the png file) puts our modeling efforts within the context of a larger program taken from a mix of actual engagements with large, global customers. The key MDM blueprint components are highlighted with numbers representing:
- Common Information Model
- The Canonical Model
- The Operating Model
- The Reference Architecture
I have also assumed a business case exists (you have this right?) and the functional requirements are known. Taken together with the MDM blueprint, we now have a powerful arsenal of robust information products we can use to prepare a high quality solution specification that is relevant and can be used to meet a wide variety of needs.
Typically, use of the MDM blueprint may include:
- Identifying all necessary components and services
- Reviewing existing progress to validate (or uncover deficiencies in) design decisions; refine or alter those decisions where necessary
- Preparation of detailed planning products (Product, Organization, and Work Breakdown structures)
- Program planning and coordination of resources
- Facilitating prioritization of key requirements – technical and business
- Development of Request for Quotation, Request for Information products (make vs. buy)
- Preparing funding estimates (Capital and Operating Expense) and program budget preparation
- Understanding a vendor’s contribution to the solution and pricing accordingly (for example, repurpose as needed in contract and licensing activities and decouple supplier proprietary lock-in from the solution where appropriate)
We are also helping to ensure the business needs drive the solution by mitigating the impact of the dreaded Vendor Driven Architecture (VDA) in the MDM solution specification.
Summary
I hope you have enjoyed this brief journey through “Modeling the MDM Blueprint” and have gained something from my experience. I’m always interested in learning from others, so please let me know what you’ve encountered yourself, and maybe we can help others avoid the pitfalls and pain in this difficult demanding work.
The difference between success and failure on an MDM journey is taking the time to model the blueprint and share this early and often with the business. This is after all a business project, not an elegant technical exercise. In an early reference, I mentioned Ward Cunningham’s Technical Debt concept. Recall this metaphor means doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort we have to do in future development because of the quick and dirty design choices we have made. The technical debt and resulting interest due in MDM initiative with this kind of far-reaching impact across the enterprise is, well, unthinkable.
Take the time to develop your MDM blueprint and use this product to ensure success by clearly communicating business and technical intent with your stakeholders.





In 
among other applications. In addition to externalizing business rules locked in proprietary applications (for example, ERP or CRM), we also use design patterns defined here to communicate between different data formats. Instead of writing translators between each and every format (with potential for a combinatorial explosion), use this in combination with the
In
in which the organization operates. Expressed in business terms, this model represents a “foundation principal” or theme we can pivot around to understand each facet in the proper context. This is not easy to pull off, but will provide a fighting chance to resolve semantic differences in a way that helps focus the business on the real matters at hand. This is especially important when developing the Canonical model introduced in the next step.
