Terry Moriarty
Searching for the Right Data Modeling Tool
6/98
Copyright, 1997
My son, who is studying to become an Architect, is becoming proficient with one of the leading Computer-aided Design (CAD) applications. In describing the benefits of this software, he said that the notation depicted on a blueprint represents a special language. Each symbol is intended to communicate a specific aspect of the building being designed. The entire blueprint represents a model of that building which allows the architect, designers and builders to clarify the interrelationships of the components which are used to construct the building. But only those people who are "in the know" understand the diagram. To the uneducated, a blueprint can be gibberish. He enthusiastically endorsed the CAD software, stating that it was much easier to develop and change the models than the hand-drawn approach he had been taught in previous classes. Furthermore, the diagrams are easier to share in electronic format than the huge rolls of paper that he normally lugged back and forth to school.
The approaches to developing models for information systems has closely paralleled the stages through which the architecture and building industry has evolved. The methodologies and notations used to develop information models, such as data models, data flow diagrams, functional decompositions and state transition diagrams, made their debut in the 1970s. To be frank, these approaches were not rigorously employed because the hand-drawn diagrams were too difficult maintain. Changes in the business requirements or application design often moved too rapidly to keep the supporting documentation in synch. Just as CAD revolutionized how building specifications are developed, the advent of Computer-aided System Engineering (CASE) has had a major impact on how information systems are developed. The ability exists to maintain complex models in software products that support the migration from business requirements analysis through application design. While we may complain that the perfect CASE tool has yet to be developed, I dont know of anyone who is willing to return to the days where PASE (Pencil-Aided System Engineering) was the norm. The question is no longer should CASE products be used, but which one best meets your organizations needs.
Tools that support data modeling and database design have outpaced their process modeling counterparts. Products exist that support the full life cycle for data analysis from business data requirements modeling through database design including database generation. In addition, many products employ reverse-engineering techniques to transform a database into its corresponding physical database model.
With so many products available that support data modeling, the challenge is in selecting the right one. The following topics serve as the evaluation criteria that I use in selecting data modeling software.
1.Quality Graphic Presentations
The quality of the diagrams produced by the data modeling tool is the first feature I evaluate. The diagrams to which we devote so much of our time exist for one reason, as a mechanism for communicating with other people. The computer doesn't need the diagrams to identify normalization anomalies, generate code for a database or perform an impact analysis. The diagrams are essential for use by the business people in validating that their business rules have been properly represented and by developers to assist in gaining knowledge about the requirements for the systems they design. If a data modeling tool is rated low in the quality of its diagrams, I immediately eliminate it from further consideration.
Given the importance of diagrams, a data modeling tool should provide access to the maximum graphics abilities, such as color, fonts, and object sizing. These features can be used to enhance the diagrams ability to communicate important aspects of the model. For example, the entities and relationships on a data model can be color-coded to identify the business areas subject areas; the primary entity in a view can be easily focused upon by increasing its object size and using a thicker pen; the attributes which represent the foreign keys propagated from related entities can be highlighted through color and font size.
Another aspect of graphic presentations is the ability to present alternate views of the same diagram. A data model can be cut into subset views. The user can control what information is displayed on the diagram. For example, in some situations, it is best to include the entity's description in its box. Under other circumstances, we want to see the attributes listed. The ability to easily switched between these various perspectives should be provided.
2. Ease of Use
If I judge the tools diagrams to be adequate, I then assess how easy it is to use. For me, a tool is easy to use if it can adapt to my preferred style of interaction. Some modelers prefer the graphic approach in which the model is initially specified by drawing the diagram. However, I prefer to enter the model, textually, with the diagrams generated by the tool, as much as possible. Once the tool has generated the initial diagram, I can then refine the diagram layout manually. I look for tools which can support both interface modes.
A tool should be very forgiving of mistakes. The ability to move attributes from one entity to another, reroute relationships lines easily and change object names with no loss of information is essential. Unfortunately, I've encountered too many tools that use the delete/reinsert approach for correcting mistakes. This approach is often accompanied with the loss of textual documentation associated with the deleted object. Any product that requires the re-entry of data to correct mistakes does not qualify as a productivity tool.
The tool should provide meaningful error messages, with options for correcting the error, when possible. A robust help facility, both on-line and in hardcopy format, adds greatly to a product's ease of use.
I pay particular attention to the facilities provided to support relationship lines. I find that the most time consuming aspect of creating a quality diagram involves the layout of the models symbols. Configuring the diagram so that the relationship lines are easy to follow with minimal line cross-overs, can be a challenging activity. Drawing the model is simple. Creating a diagram that is pleasing to the eye is an entirely different matter. I look for tools that give me the maximum control over the relationship lines. I should be able to connect it to any spot on the entitys box. As entities are moved about the diagram, the tool should be able to automatically route lines through a path that avoids other relationship line cross-overs and travels around entities, rather than through them.
3. Meta-data Management Facilities
While the diagrams are the primary communication vehicles during the modeling sessions, the model is not complete without its textual documentation. An essential feature of any data modeling tool is the robustness of its meta-data management facilities. The meta-data capture and maintenance facilities should be seamlessly integrated with the tools drawing environment. Typically, double-clicking on a diagrams symbol will bring up the documentation window for the associated meta-object. Many tools only provide the ability to enter large blocks of text, as the supporting meta-data. I prefer products which provide a rich set of properties which allow me to capture information that is meaningful to each individual meta-object class. The accompanying report writer facility should allow me to specify which properties are incorporated to each report.
While corporations tend to select one business analysis and application development methodology as their standard, most find it necessary to augment the methodology, primarily in the area of documentation standards and naming standards. Documentation standards include the properties that are captured about the individual objects included in the model and the format of analysis and development deliverables. Naming standards deal with the rules for constructing names for the analysis and development objects which attempt to ensure uniqueness and, occasionally, ownership. Documentation and naming standards often reflect the personality and culture of the corporation. Therefore, the enforcement of corporate standards is often the area that requires the greatest degree of customization within the supporting toolset.
Most methodologies consist of a rich set of rules for constructing proper diagrams, in terms of the types of objects that can appear on the diagram, the meaning of each symbol of the notation and how the diagram objects can interrelate. Unfortunately, similar care has not always been taken in what information should be documented about those objects, beyond name and description. While most CASE vendors provide the ability to capture additional properties about a specific methodology object, I find that none have anticipated all the information that I want to capture about those same objects. Therefore, the ability to extend the set of properties that can be captured about the meta-objects in the methodology is a very desirable feature. Otherwise, other means for capturing this information must be used, such as the corporate repository or a word processor. Every time a different tool must be used to fully document a single methodology meta-object, the more time is required to re-integrate this information when the documentation that supports the diagrams is generated.
Similarly, naming standards are an important aspect in Data Administration's efforts to control the proliferation of alias names for the same business fact. Since it is less time consuming to re-use an existing field definition instead of defining the field each time the business concept is encountered, naming standards are an important productivity tool in application development.
Naming standards normally included the use of a standard abbreviation list that is used in constructing data element or column names. These standards exist because many languages limit the length of names that can be assigned. Although the process of developing the standard field name from an attribute name is quite simple, few CASE tools incorporate this feature. Yet abbreviating names is one of the more time consuming and boring tasks that a developer faces. I found that an average of 1 minute is required to manually construct a valid column name, using a paper-based standard abbreviation list. Once I automated the process, less than 3 seconds was spent abbreviating the name. An added bonus was the automatic generation of a list of terms for which abbreviations needed to be developed. Through the automation of naming standards, I saw a 2000% increase in productivity. Yet none of the CASE tools that I've encountered have included the ability to construct standard names.
The tool should provide the ability to easily navigate through the linkages between model components. For example, double-clicking on an attribute in the entitys attribute list should bring up that attributes meta-data. Ideally, the selected symbol on the displayed diagram should change to the meta-object whose meta-data window is currently displayed.
The ability to produce customized reports in a variety of formats is essential. A report writer should be provided which can be used to create reports "your way". You should be able to specify which meta-object types and properties are included in the report, as well as the formatting of the layout, including fonts and colors. The ability to generate matrices and indented lists that show the relationships between model components is also feature to look for. An important feature of any report generator is the ability to select which model components to be included on a specific report. A variety of selection criteria should be supported, such as an entitys neighborhood, for a specific set of subject areas, by attribute usage or those model components that were changed in the last modeling session. You should be able to save your custom report specifications and selection criteria. Some tools provide the ability to specify complete deliverable packages that can be used to generate documentation consistently for any project. Tools that do not provide their own report writer, should provide access to their databases through a third party report writer product.
The tool should support an open architecture. Minimally, an import/export facility should be provided to allow the movement of data between other application development productivity products. The format of the import/export file layout must be published. Ideally, the tool will support industry standards, such as CDIF and the Meta-data Coalition Interface Specification.
4. Model on the Intranet
Today, the intranet is one of the primary technologies in use for delivering information to an organizations users. Consequently, the ability to publish a models diagrams and meta-data to the web has become an essential feature. I look for all the capabilities provided through the CASE tools native interface: click-able diagrams that bring up the relevant meta-data window for the selected symbol and hyper-links that allow me to navigate between the model components. In addition, I want to exploit the webs ability to support threaded feedback and discussions on the model. The internet provides a vehicle which can allow users across the organization who would never have been exposed to the models to view and critique them. I want to encourage this type of discussion around the models content. Therefore, facilities should be available to collect, assimilate and respond to model-related messages.
A fully-functional web-based client which can be used to develop and maintain models is also a desirable feature.
5. Application Component Generation
A CASE tool is just a documentation tool if it does not have the ability to generate application components their specification recorded in the CASE. One problem that has always plagued developers is the fact that applications tend to get out of synch with their documentation. All too often, changes are made to the applications components that never get reflected back into the documentation supporting that component. As one of my associates stated: "To ensure that the documentation matches the code, you must generate the code from the documentation".
Any data modeling tool should be able to generate database component for your target technology, such as:
Since many companies develop for multiple platforms, the ability to generate database components for different technologies from a single specification is required.
6. Full Life Cycle Support
The Zachman Information Systems Architecture (ISA) framework (NEED REFERENCE TO ZACHMANS ARTICLE HERE) is a powerful approach for understanding the complexities associated with implementing an information system. I look for data modeling products which are part of an integrated tool suite which provides support for the entire Zachman Information Systems Architecture by providing automatic links across the columns and automated assistance as one row is transformed into the next perspective.
For example, I would like to be able to automatically cut of view from the data model, based on the attributes that were linked into the dataflows defined in a process's data flow diagram. I would like to have a list automatically generated of attributes defined in dataflows that have not been yet been assigned to some entity in the data model. I want the tool to provide alerts of inconsistencies across a row, as well as within a specific cell, of Zachman's matrix.
Furthermore, when the time comes to transform the models of one row into the next row, I would like as much automated support as possible. For example, several available products exist that make the first cut at transforming a model which is in Business Normal Form into a first cut 3rd Normal Form Conceptual Model. DBAs could use a product that recommends alternative database designs based on information captured about access paths information and performance volumes and statistics. Likewise, a tool with the ability to maintain the transformation links from business data model to conceptual data model to the denormalized database design would be highly favored within the data management community.
7. Enforcement of Methodologies
If a product advertises that it supports a specific analysis and design methodology, then it should enforce the rules defined by that methodology. Enforcement should be implemented in a manner that aids, not punishes, the user. For example, suppose I'm using a tool that supports IDEF1X for data modeling. I place two Independent Entities in the model and define an identifying relationship between these two entities. The tool should ensure that the child entity in this relationship is a Dependent Entity.
In the punishment approach to enforcement, the tool would display a message chastising me for defining an identifying relationship between two independent entities. I'm expected to correct the error by changing the child entity to a dependent entity or changing the relationship to a non-identifying relationship. Often, this type of enforcement is done after the fact through a rules enforcement option that is executed at the discretion of the user.
On the other hand, a tool is a valuable assistant if it catches the methodology error in real time and provides the options for rectifying the problem. Upon selection, the tool does all the processing required to implement the choice, thus ensuring that the rules of the methodology are adhered to. Both approaches satisfy the requirement that the methodology be enforced. However, the assistant approach has the tool perform the work to correct all the rules to display the model correctly. Since it can perform these corrections faster than I can, my productivity is increased and the quality of the model is preserved.
If a tool supports multiple methodologies, then it must ensure that it fully enforces the rules for each methodology. Furthermore, it should not allow notation from the different methodologies to be intermixed. I don't want to see Chen relationship diamonds and Information Engineering crow's feet in the same model. The algorithm used by the tool to convert diagrams from one methodology to another must be documented, complete and, ideally, customizable. For example, a tool may implement the following rules to convert a Chen model into a relational data model:
If a tool claims to support relational modeling, then the propagation of foreign keys should be automatic. Nothing annoys me more that spending time manually linking the same attribute to all its related entities, when the tool has all the information needed to perform this propagation. Attributes that represent the foreign keys propagated from another entity should be noted as such in the model.
While I believe that strong enforcement of the methodology rules is important, there are occasions where this requirement can be relaxed. On occasion, I develop diagrams that are primarily for presentation purposes. These tend to be simple drawings whose purpose is to convey some important, high level concepts in which the precision of the methodology notation can get in the way of the message. For these types of diagrams, I consider the ability to turn off rule enforcement essential.
8. Model Management Capabilities
A tool that only supports one model at a time is fine for an organization that only builds stand alone applications which never interfaces with other application. Each application represents a self-contained island. Personally, Ive yet to encounter this type of application portfolio. While program code may not be shared across applications, data most definitely is. In this situation, the data model represents to first point of where the need for data sharing and integration across business areas is recognized. Consequently, a robust data modeling product must support the need to re-use model components across multiple business analysis projects.
The ability to partition models into chunks that can be worked on by different analysts for different business areas must be supported. Likewise, a data modeling tool should be able to compare different versions of models and identify how the model components diverge. Finally, the ability to merge models must exist. If the data modeling tool does not supports these types of model management facilities, then it should be able to integrate into a separate model management facility that provides these types of capabilities.
9. Data Reverse Engineering Capabilities
Any tool ranks high in my book when it eliminates the need to rekey information that already exists electronically. Therefore an important productivity feature for any data modeling tool is the ability to reverse engineer data structures from existing applications. Companies rarely develop entirely new, stand-alone applications. In most cases, they are replacing an existing application that must co-exist with the other applications in the portfolio. Therefore, the ability to bring those existing data structures into the forward-engineering development environment as the basis for developing interface and conversion specifications is required.
While many CASE tools provided the ability to reverse engineer data structures, the data structure components were usually brought into the tool's encyclopedia as entities and attributes, so that their structure could be diagrammed as a data model. I prefer a product that choses to populate its meta environment with meta-objects appropriate for the Technology row of Zachman's framework, using tables and columns. The product should then be able to takes the next step in data reverse engineering, by transforming the physical database model into a business normal form model. For example, associative entities are transformed into many to many relationships.
My data modeling tool evaluation criteria represents, to a degree, my wish list for the perfect product. At this point, no single tool satisfies all my requirements. But when I reflect on the functionality provided by those initial products that were released a decade ago, our vendors have shown the ability to constantly improve their CASE tools. While my perfect tool may not exist yet, I believe that many vendors are listening. Over the years, they have each produced a product that is closer to my version of nirvana. Its definitely becoming difficult to say which data modeling CASE tool is my current favorite.
Terry Moriarty, president of Inastrol, a San Francisco-based information management consultancy, specializes in customer relationship information and metadata management. Her common business models have been used as the basis of customer models for companies within the financial services, telecommunication, software/hardware technology manufacturing, and retail consumer product industries. You can reach her at terry@inastrol.com.