Metadata model of functional model iteration
Metadata Management: An Essential Component Of Integration Projects
Tim Cianchi, Business Unit Leader at Zuhlke, looks at the impact metadata management is having on business IT systems, and considers how organisations can approach systems and information in a way that both addresses general business efficiency and integration requirements.
For as long as organisations have been recording business information there have existed a number of issues surrounding data integrity, accuracy and management that, if addressed correctly, have resulted in competitive advantage for the companies involved.
This is no secret and comes as no surprise.
In fact it is common sense that a business whose data is well-structured will demonstrate general efficiency, agility and capacity gains over less organised counterparts.
Auditing and reporting are obvious day-to-day business functions which will benefit from well structured data and optimised management systems. However, the level of control over an organisation’s business information becomes of exceptional importance at a number of critical junctures, such as system integration, when the information and systems belonging to previously discrete companies or business units are required to integrate.
The most common integration scenarios arise from mergers and acquisitions, replacement of existing systems, and implementation of interfaces to a new business partner or department.
In all these cases, data flows from one system (or collection of systems) into other systems. For data to be correctly processed (understood as information) in the receiving system, its elements must be correctly transformed and mapped into the expected formats.
This mapping is rarely a trivial exercise since in most cases the systems exchanging information have been designed and built by different teams at different times and with reference to different standards (if any). Leaving aside technical aspects relating to message transport protocols and packaging formats, the primary problem for an integrator is to ensure that the business semantics (meaning) of the data is retained as it travels from one system to another.
We have seen many examples both within and between organisations, where this is much harder than it might at first seem. Because IT systems have typically evolved in isolation from each other, the vocabularies used to describe data elements usually differ significantly. A common problem is that of synonyms and homonyms. Synonyms – when the same data entity is described with different names in different systems. Homonyms – when fields having the same name in different systems describe different data entities. A more complex problem is a mismatch between sets of elements needed to describe a unit of business information.
For example, one system might require three fields to describe an address, and the other system four fields. Further complex problems arise with references to “master data”, where each system has copies of information relating to third-party or common data, but stored using different codes, identifiers and structures. Even worse, systems often have highly conditional datasets, for example “if field A has value X, then fields B and C must be present and mean this, otherwise field D is used and means that”. These rules need to be captured and understood in order to preserve meaning. The information describing the data entities, their semantics and the rules for validating, enriching and transforming them are collectively known as “metadata”.
Historically. metadata has not been treated with the respect it deserves.
With luck it is present in a relatively complete form when a system is first proposed as a specification. During implementation it ends up embedded in a variety of forms, some of which (for example database schema) can be easily interrogated, and many of which (such as application programs) cannot. It is not that some forms intentionally hide their metadata, but rather that there was no requirement at the time to expose it.
Following initial deployment, the system is modified through a series of change requests and bug fixes and the original specification rapidly becomes outdated and no longer corresponds to the actual implementation.
In this fairly standard application lifecycle, the metadata becomes scattered across a number of artefacts, and at any point where it is necessary to reconstruct it for a given purpose, such as implementing a new interface, a costly manual exercise involving business analysts and or developers is required. This typically results in a metadata document – perhaps a spreadsheet – which may be accurate at the time of production, but is never maintained, and again rapidly becomes out of date. The whole exercise has to be repeated when the next interface request comes along or a client of the system demands to know how a particular set of information is derived.
This typical approach, where there is no explicit metadata management, has been acceptable in the past, and may remain so in the future where systems and their interfaces are simple, stable, and where the costs and risks of unmanaged metadata can be reasonably borne. However many organisations today are coming to realise that it is critical to have an explicit and accurate representation of their data stores and flows. This is particularly true of organisations whose core business is conducted through large and complex IT systems which have grown over many years through combinations of departmental and organisation mergers using both home grown and packaged solutions.
Active metadata management is currently gaining traction within these kinds of organisation, and there are two main reasons – firstly the cost and time involved in repeated manual recovery of metadata to meet new business requirements has been recognised, and found to be unacceptable, especially in the current economic climate, and secondly because awareness of and tool support for managing metadata has improved significantly over the past few years.
So let’s look at what it means to manage metadata.
Firstly, metadata is just data about other data and its relationships. One characteristic of metadata is that the data volumes are typically small in comparison to the data described – a few megabytes of metadata might be more than enough to describe terabytes of transaction data. On the other hand, the complexity of metadata can be high, since it has to deal with complex and conditional relationships.
A vital aspect of metadata management is governance – since the point of collecting and maintaining the metadata in the first place is to provide a “source of truth” about the systems described, it is critical to ensure the quality and currency of the metadata, and to have a governance process that describes how metadata changes are authorised. A metadata repository therefore needs both a good version control system, with the ability to highlight changes between versions, and workflow support to implement governance.
To ensure accuracy and currency of the metadata, the ideal solution is to refresh metadata from actual deployed systems. A good metadata tool will be able to automatically import data (scheduled or on demand) from a number of sources, such as database and XML schemas, mapping tools, report definitions, and other application artefacts, as well being able to cope with static sources such as spreadsheets and comma separated files.
A metadata repository must allow for flexible metadata representations, and support features such as glossary building. It must support lineage capture and analysis, which allows for tracing data flows and derivations between systems. It must also have good support for querying and outputting the metadata in various forms. Lastly it must be accessible to business analysts – it is primarily a tool to make their jobs more productive.
Given that an organisation understands the benefits of managing metadata, how should it go about the process of implementing a solution? The first thing to realise is the strategic nature of any solution. The main use of the metadata is to understand how information is managed between systems, rather than in any one individual application. At the very least a strategic vision needs to be in place from the outset. This must encompass the key sources and uses of the metadata, and the expected business benefits.
This vision will inform the next step, which is the selection of a suitable toolset and metadata repository. The wrong choice here can cripple an initiative, as we have seen in the past. It is vital to establish sound selection criteria including scaleability, automated import features and good usability, which will meet the organisation’s needs. It may be tempting to take a product on offer from an incumbent supplier, just because it is cheap, or compatible with their software stack, but this is not a sound basis for a successful return on investment.
Following tool evaluation and selection, the next steps are to design a suitable structure for the metadata which supports the needs of the organisation, and to agree a lightweight governance process. Both of these will need tweaking as experience is gained – however adjusting the metadata structure can become very expensive as the volume of captured metadata increases – it is worth leveraging people with experience in this task to get reasonably close on the first attempt!
Actual implementation should follow a more tactical path picking low hanging fruit, where it is reasonably easy to both harvest a sufficient quantity of metadata, and to use it for the creation of immediate business value. If metadata is available in a form that can be readily imported into the selected tool this is a huge advantage. In other cases, it may be possible to build a specialised importer. An example of this might be an importer that can parse SQL statements in stored procedures and perform a static analysis to extract the data lineage.
Ideally the tool will provide sufficient “out of the box” reporting that it can be used to deliver business value to analysts, clients who need interface definitions, compliance officers and so on. Again, it may well be worth writing specialised reporting tools that deliver metadata in a form more usable by other teams or applications. For example, it is possible to generate partial report definitions or system specifications from metadata, and to include relevant glossary definitions, so that an outsourced development team can leverage these to deliver valuable applications faster.
As soon as sufficient experience has been gained on some typical systems, the metadata repository structure should be reviewed and updated if necessary, based on the strategic vision. Unfortunately, although all metadata shares the same general format – descriptions of entities and the relationships between them – we are not close to a “universal solution”. Each organisation needs to implement a metadata management solution that meets their strategic needs, and delivers an appropriate cost/benefit ratio.
There is no easy answer to how organisations should approach the integration of IT systems. However is it becoming clear that for many organisations with complex IT systems a proactive investment in metadata management can generate immediate tangible returns, in addition to very substantial benefits over the longer term.