T E R R Y M 0 R I A R T Y
February, 1997
With the emergence of the data warehouse, metadata is finally being recognized as an essential component of the business information resource. We always knew that metadata has value. We just had a hard time finding business stakeholders who appreciate that value. Now the data warehouse has created the need for metadata. By describing the data available through the data warehouse, metadata provides the brains of the decision support environment. Metadata provides the context that transforms the warehouse data into meaningful business information. Data warehouse users-business management and executives-are the most powerful individuals in the organization. What they want, they normally get. And right now they want access to metadata. Now, for the first time, we have the opportunity to elevate metadata to the same level of importance as the business data in our corporate databases. The moment is right, and we must seize it.
Managing metadata is not a casual activity; it must be carefully planned. To manage information as a business resource, we must have a system of management controls in place to govern how information is created, maintained, used, delivered, and secured. This type of metadata management environment requires that we use the same project-oriented disciplines we use to manage the development of any new business system. The metadata management project manager must assemble a multidisciplinary team of metadata, data warehouse, and data management stakeholders to assume responsibility for establishing the warehouse metadata management environment. Project tasks include:
* Defining requirements for the metadata. that must be available to warehouse users
* Developing the warehouse metadata management architecture
* Selecting which tools should be incorporated into the metadata management infrastructure
*Developing programs that integrate and customize the selected tools to create an integrated suite that meets the organization's specific metadata management needs
* Developing and executing a training program for warehouse metadata users.
THE TEAM
Putting together the right team to manage the project and defining the role of each player is crucial to the project's interests. In a metadata project, the data warehouse project team assumes the role of business user. The data warehouse business user liaison represents the data warehouse user community in providing their metadata requirements. The data warehouse data management team (data modeler, data analyst, and database administrator) and architects represent the primary users of the metadata management environment. Their responsibility is to populate the metadata repository with the relevant descriptive data that supports the data warehouse business model, database design, source data (including transformation and integration rules), and the infrastructure comprising load rules, technology configuration (what data is available through which DBMS on which hardware platform), and data replication rules that make up the warehousing environment. The data warehouse administrator is responsible for ensuring that the warehouse is populated with high-quality data. Therefore, this person provides the metadata that describes the data quality tolerances and data validation rules and maintains data about any data quality problems detected.
The metadata management team members are the equivalent of an application project's analysts and developers. They are responsible for compiling metadata management requirements and developing the supporting metamodels; developing the warehouse metadata management architecture; defining the tool evaluation criteria; facilitating tool evaluation and selection; configuring, customizing, and implementing the metadata management infrastructure; and training the warehouse team to use the metadata management environment effectively.
Two additional stakeholders may exist in your organization: the corporate data administrator, who is responsible for the corporatewide data standards and metadata management environment, and the enterprise information modeler, who is caretaker of the organization's enterprise model. If your organization has already established metadata standards, you should determine how well they meet your warehouse metadata requirements. You may need to enhance the existing standards to accommodate warehouse concerns. Ask yourself the following questions:
*Does the current standard provide a mechanism for reusing existing corporate metadata as the basis for the warehouse's metadata or for linking warehouse metadata to the corporate version? If it doesn't, you should consider these requirements as valid for the data warehouse metadata.
* Does a corporate metadata management environment already exist within your organization?
* Does a standard suite of tools exist (for example, CASE/modeling products and repository/data dictionary) that must be incorporated into your infrastructure analysis? If so, consider how well these tools meet the data warehouse metadata management needs. Not all tools that support OLTP application development are suitable for data warehouse development.
* Can you "tend your standard tools to support essential data warehouse metadata (such as star models, data quality metrics, data mapping and transformation rules, and load statistics)?
If your organization doesn't have a corporate data administrator or enterprise information modeler, the data warehouse may serve as a catalyst for creating these corporate-level positions. You shouldn't expand your project's scope just to encompass these global metadata management perspectives. But keep in mind that the metadata management environment you create to support the warehouse may eventually serve as the basis for corporatewide use. In other words, make every effort to develop a metadata management infrastructure that meets your project's scope without precluding its eventual evolution into corporate metadata management environment.
INTEGRATION
The metadata management project is more of a systems-integration problem than an application-development effort. Few organizations build their own metadata management tools. Instead, they rely on a collection of vendor-provided products that work together to move the data from source applications into the warehouse environment. Each tool supports a discrete set of data warehouse functions that use or create a subset of the data warehouse metadata (see Figure 1). Typically, several tools require access to the same metadata. However, for the most part, each product is self-contained and provides its own metadata management facilities. Consequently, metadata definition is redundant across the data warehouse tool suite, resulting in disparate metadata. This metadata must be integrated if it is to be useful to the warehouse metadata users.

Metadata integration is similar to data warehouse data integration. Just as the data warehouse environment integrates disparate operational data, the warehouse metadata management environment must integrate disparate metadata. The metadata management infrastructure must be configured to collect disparate metadata from the source tools, integrate this metadata, and disburse it back to any tools that use it. To integrate metadata, you can apply the same data analysis and mapping techniques you used in developing the transformation rules for the data warehouse. Likewise, when creating the integrated metadata, you must address the same sourcing concerns that face any data warehouse data analyst.
Because several data warehouse tools may capture the same metadata, you must determine which tool is the most appropriate source for each metadata object. For example, the data structure definition for a relational database can be sourced from the DBMS's catalog. However, if that database's table structures are always generated from your data modeling tool, the catalog is probably a better source for the database's tables and columns because you can also extract the definitions, domain, and derivation rules.
Tools may capture different attributes about the same metadata object. You must develop a synchronization strategy to identify each tool's version of the same metadata object and consolidate their attributes into a single version. For example, if updates to your relational database are made through revisions to the data definition language and bypass the data modeling tool, then the table and column structure must be extracted from the DBMS catalog. The definitions, domains, and derivation rules that the data modeling tool holds must be matched and merged into their metadata object as a separate process. Likewise, the transformation rules that the data mapping tool holds must also be integrated into a consolidated version of the metadata object.
Melding metadata objects from disparate tools can be an incredible challenge, because the only common attribute by which to identify each version of the same metadata object is its name. If the data modeling tool holds a column's business name but doesn't record the actual database column name (not unusual when the tool doesn't generate the database schema), you may not be able to match the metadata object versions sourced from the data modeling tool and the DBMS catalog. One strategy is to enter the table and column names into the tool manually so that matching on a common identifier is possible, If different tools record the same metadata attribute, you must establish rules regarding which tool maintains the master version. Finally, metadata held as text in word processors may need to be merged into the consolidated metadata object manually. Unless this metadata has been documented with a uniform template that lends itself to one of the report-scraping programs, developing an automated approach to extract this free-form text will probably be more time-consuming and error-prone than manually reentering the text into the metadata management environment.
THETOOLS
The next challenges you face in implementing your metadata management environment are the actual extraction, merging, and storage of metadata. Remember, we're systems integrators not application developers; we want to write as little code as possible. We naturally expect our tool vendors to help us implement this aspect of the metadata management environment. Unfortunately, we currently have very few alternatives. Look for tools that support an industry standard. At this point, CASE Data Interchange Format (CDIF) is the only standard for which vendors provide any level of support. Although better standards may exist, they're useless if vendors don't implement them in their products. The CDIF standard provides a neutral format that lets you translate metadata between products. By translating to and from this neutral format, vendors can ensure that their products are able to interface with others without being affected by any changes made in another product's metadata support.
Another metadata interchange standard is also emerging. A group of data warehouse vendors who recognized the need to share metadata across the warehouse environment founded the Metadata Coalition. Most of the leading vendors in data warehouse extraction, OLAP, DBMS, and repositories are participating in this coalition. The Coalition's interface specification, known as the Metadata Interchange Format (MIF), has been ratified and is available at its Web site, www.metadata.org. MIF focuses strongly on supporting metadata that maps source data to the data warehouse database, from both a physical database and a business user's perspective. Although some have criticized the specification for being too limited, its designers didn't intend it to handle all of an organization's metadata in the initial release. The beauty of MIF is that it's simple to understand and implement, and it supports nearly all the metadata requirements of the average data warehouse business user. I highly recommend that you encourage your data warehouse tool vendors to support this standard.
If your vendors don't support either of these standards, look for tools that interface with one another. Typically, this type of interfacing is a one-way bridge by which one tool translates another tool's metadata into a format that it can import. With this approach, the vendor providing the interface assumes the risks associated with changing metamodels. If either vendor makes enhancements to its metadata management approach that impact the interface, that vendor assumes responsibility for ensuring that the bridge between the tools continues to work.
As a last resort, you may have to code the interfaces between tools yourself. If you decide to take this approach, you'll have to determine the impact on your bridge before installing any new release of a product. You'll also need programmers to maintain and enhance the bridge as changes are made in the vendor products. You should only adopt this approach if neither of the other alternatives is possible. I recommend that you make "metadata sharing" a high-priority criterion for any data warehouse product evaluation.
Finally, you must decide where to maintain the integrated metadata objects. If your objective is to support only the metadata that the data warehouse uses, you can maintain the integrated metadata objects in a read-only environment in which the integrated metadata objects are never updated directly. In this scenario, the source tools hold the master copies of the metadata they capture. Periodically, updates from the metadata sources are reapplied to the integrated metadata objects. The properties that define a single metadata object maybe distributed across the data warehouse management tools. For example, the definition of a data warehouse column may reside in the supporting CASE tool while the data warehouse extract tool maintains the transformation rules to the column's sources. Although this strategy keeps the metadata easily accessible to the tools, it requires that changes be synchronized. If the environment holding the integrated metadata objects is considered to be the master, then updates are made directly to the integrated version of a metadata object. In this situation, you must implement mechanisms for distributing updates to the modeling, data quality analysis, and data extract tools.
Traditionally, repository products, such as MSP's DataManager, R&O Inc.'s Rochade, and Platinum Technology's Repository, provide the extensibility, query, and versioning facilities necessary to manage metadata. As the importance of metadata to warehouse users becomes apparent, data warehouse extract vendors (such as Prism Solutions Inc., Carleton Corp., Apertus Technologies Inc., and ETI Solutions Inc.) are beginning to incorporate more metadata management capabilities into their products. However, a new class of tools specifically designed to manage a data warehouse's metadata is becoming available. lntellidex Systems Inc.'s Metadata Control Center is an example of this new genre of metadata management tools. It is architected to capture and integrate metadata from metadata source tools and to enable end-user access to that metadata directly or by exporting it into their OLAP or data mining tools. In its first release, Metadata Control Center captures metadata from ERwin or relational DBMS catalogs and provides an interface to the Business Objects OLAP product. It is also one of the first products able to import metadata according to the Metadata Coalition Interface Standard. Near-term releases of the tool will support metadata importing from Carleton's Passport, the integration of metadata from ERwin and the relational DBMS catalogs, and the ability to export using the MIF standard. The Intellidex Metadata Control Center also lets end users sample warehouse data to construct their own view or data mart, which they can then schedule for downloading into the chosen environment.
Although configuring your metadata management environment may currently feel like building an airplane with bubble gum and rubber bands, there's hope that we'll finally get the tools we need to properly manage metadata. With data warehouse business users clamoring for access to their metadata, our vendors are cooperating to provide the integrated environment necessary to capture and deliver that metadata. Our time is now, and we must seize the moment if we are going to be successful in our metadata management objectives.
Terry Moriarty, president of Inastrol, a San Francisco-based information management consultancy, specializes in customer relationship information and metadata management. Her common business models have been used as the basis of customer models for companies within the financial services, telecommunication, software/hardware technology manufacturing, and retail consumer product industries. You can reach her at terry@inastrol.com.