dbpd1.GIF (9633 bytes)
T E R R Y     M 0 R I A R T Y

Seven criteria for evaluating which tools are best for you

Reverse -Engineering Tools:
How to Pick Them

June, 1997

Reverse engineering is the process of decomposing an application into its components and identifying the linkages among them. It is performed on the data structure specifications of the files and databases of the applications that provide source data to the data warehouse. Associated programs can also be subjected to reverse -engineering analysis to extract business rules embedded in their code. The types of business rules you can hope to expose that are of interest to a data warehouse sourcing effort include:

* Valid values as represented by 88-level specification in a Cobol data structure definition

* Initial values from the VALUE clause in a Cobol data structure definition

* Derivation rules found either in COMPUTE statements or by tracing through computational statements (such as ADD or SUBTRACT)

* Data dependency rules represented through logic embedded in IF ... THEN ... ELSE, PERFORM, and DO ... WHILE constructs.

If your objective is to reengineer an application, additional reverse engineering is required. You must analyze the entire application to understand:

* The dependencies among processes, as represented by the application's jobs

* The flow of data among the application programs and other applications, as represented by the datasets used in the jobs and by the linkage parameters passed to other programs through CALL statements

* Disparate data definitions that refer to the same data element with multiple names

* The external representation of data delivered through the application's screen or form specifications and reports.

This level of application knowledge is also necessary to support changes impacting the entire application portfolio, such as a Year 2000 project or a project to improve an organization's application-management processes using a management standard, such as the SEI Capability Maturity Model, as a guideline. In the latter case, the organization must accumulate metrics that provide an understanding of the complexity of the application's program code.

A number of tools are currently available to automate much of the reverse-engineering process. These tools scan data structure definitions and program code as a way of populating a database with individual application components and the relationships among them. Examples of the types of tools that support varying degrees of reverse engineering include:

* Repository products (such as Viasoft's Rochade, Manager Software Products' DataManager, and Platinum's Repository)

* CASE tools (such as Logic Works' ERwin, Popkin Software's System Architect, and Powersofts S-Designor)

* Data warehouse source extraction products (such as Prism Solutions' Warehouse Manager, Evolutionary Technology International's ETI-Extract Tool Suite, Carleton's Passport, and Apertus's Enterprise/Integrator)

* Tools specifically built to support reverse engineering (such as Viasoft's Existing Systems Workbench, Adpac's PM/SS, IBM's Redeveloper, and Micro Focus's Revolve)

 

* Metadata directories (such as Intellidex's Data Warehouse Control Center and Logic Works' Universal Directory).

Given this wealth of automated support for reverse engineering, it can be a challenge to select the appropriate products for your environment. In this column, I discuss seven criteria for evaluating which products best meet your specific needs. The criteria are:

* The technologies the tool supports

* The scope of the tool's input source

* The output options the tool provides

* The documentation options the tool provides

* The types of business rules the tool can extract

* The extent of the tool's conceptual model-extraction capabilities

 

* The kinds of program complexity metrics the tool can calculate.

The examples I give when discussing these criteria are slanted toward the IBM MVS Cobol environment, because the application portfolios for many organizations have been implemented using this technology. The criteria should serve as equally good guidelines for choosing reverseengineering tools for applications implemented on other technology platforms. However, with these other platforms, your choice of vendors may be more limited than with IBM MVS Cobol.

 

TECHNOLOGIES SUPPORTED

Not all products have the ability to reverse engineer all programming languages or operating systems. The environments supported often fall into the following categories:

* IBM mainframe operating systems (such as MVS and VM)

* Unix-based applications

* Specific programming languages (such as Cobol, PL/1, C/C++, embedded SQL, triggers, stored procedures, and JCL)

* Relational database specifications (such as Oracle, DB2, Sybase, and Access)

* Nonrelational DBMS specifications (such as IMS DB, IDMS, and dBase)

* Teleprocessing monitors (such as IMS DC and CICS).

You should be looking for a tool suite that supports all of the technologies of interest to your project. For example, your data warehouse project probably doesn't need to reverse engineer the jobs from its source applications. However, you may find it valuable to extract business rules from the program code.

For the other types of projects that can benefit from reverse engineering, the full range of reverse-engineering capabilities for all the technologies used by the applications being analyzed should be part of your evaluation criteria.

 

SCOPE OF INPUT SOURCE

The scope of sources you plan to draw from is an important consideration when selecting a reverse-engineering product. The most common approaches to sourcing inputs for reverse engineering include:

 

Applicationwide. With this approach, the sourcing begins with the application's job-control library. The complete interaction of the application's components are inventoried as a unit. This approach is probably the most desirable, because an entire application can be decomposed at once. Tools that take such a global approach to reverse engineering usually identify the interrelationships among programs. They can trace the movement of data between the application's jobs and programs. In fact, they can subject the entire application portfolio to reverseengineering analysis to identify how data flows across application boundaries. At the same time, this approach lets you zero in on any subset of an application and analyze the components that are relevant to a specific set of functions.

 

Individual programs or copybooks. Although tools designed for this approach may be able to extract data directly from an application's source code or copybook libraries, they actually parse the input from one program or copybook at a time. You must specify which programs or copybooks you want to process. The tools may have no support for cross-program analysis, so many only provide reverseengineering results from a single program or copybook's perspective.

 

The relational database catalog. With this approach, reverse-engineering tools connect directly to the catalog to import the available data. The tools can usually extract usage statistics maintained in the catalog, as well as relationships between the database components.

 

DBMS data definition language (DDL) and teleprocessing monitors control language. Tools that use this approach scan the database's component specifications (such as SQL, IMS PCBs, and Segment descriptions) or telecommunication control language in the same manner as other tools would parse source code or copybooks.

The ability to scan SQL DDL can be beneficial when your database is held in a relational DBMS to which your reverseengineering tool can't connect. If you can generate the SQL DDL from that database's catalog, you can still reverse engineer the database design.

 

OUTPUT OPTIONS

Consider carefully how the various tools store the results of reverse engineering. Several alternatives exist:

 

Reports only. Although precanned reports are valuable as a starter set of deliverables, they may prove insufficient in the long run. If reports are the only output your tool provides, you won't be able to query the results to analyze perspectives other than those dictated by the tool.

 

In a relational DBMS. Tools that store reverse -engineering results in a relational database that supports access through SQL or some other general purpose query language will let you manually enhance these results with definitions and additional business rules. You'll also be able to create customized reports and conduct impact analyses that the tool doesn't support directly.

 

In a repository. If your organization is committed to migrating to a repositorybased metadata management environment, then the reverse-engineering results should be output into a format that you can import into your repository product. The ability to bypass the import step through a direct connection to the repository is a highly desirable feature.

 

DOCUMENTATION OPTIONS

The tools you choose should be able to generate a variety of meaningful reports that can assist the analysts in understanding the application that has been reverse engineered. Your tool suite should offer the following types of documentation.

 

Predefined reports. Reverse-engineering tool vendors have a great deal of experience in conducting application awareness analysis. They have a good idea of which reports are most useful in gaining knowledge about the applications being studied. Therefore, a fully documented starter set of analysis reports should come with the tools.

 

Diagrams. As we all know, a picture can be worth a thousand words. Diagrams that depict the links between an application's components can be the greatest benefit to analysts attempting to gain a broad understanding of the application quickly. Your reverse- engineering tool suite should be able to generate these diagrams, preferably through your favorite CASE tool. The diagrams should include:

* Data models of the physical data structures

* Data model views to support the views defined in the SQL DDL or as embedded SQL in application code

* Job flow diagrams and hierarchy charts

* Program flow diagrams and hierarchy charts

* Program logic flowcharts.

 

Report-writing facilities. Although precanned reports provided by the vendor are a good starting point for analyzing your reverse-engineering results, they are probably insufficient for your specific analysis needs. Therefore, your tool suite should let you create custom reports. If the reverse-engineering results are held in a database or can be imported into your repository, then the report-writing facilities of those environments should be adequate. If not, a report-writing facility should rank high in your evaluation criteria.

 

A standard format that can be imported into other tools. Other products in your metadata management environment will need access to the reverseengineering results. It's important that your tool suite be able to generate the results in a format that these other tools can accept. Ideally, all of your tools should support one or more of the industry standards (such as CDIF or Metadata Coalition Interface Standard). However, if they don't, you should make sure that they can produce files in a format that can be imported into your other metadata management tools (such as CASE tools and data mapping and extraction products).

 

TYPES OF BUSINESS RULES EXTRACTED

 

Many different types of business rules are embedded within an application's code and data structure definitions. Your reverse-engineering tool suite should be able to extract as many of these business rule types as possible. Some of the business rule types embedded within application code and data structure are:

 

Identifiers, candidate keys, and available access paths. With most database applications, the table identifiers can be easily identified from the DDL. Candidate keys will be specified as indexes that must be unique. All other indexes and views identify the available access paths to the table. Finding the identifiers and access paths to nondatabase files is more problematic. If the files are implemented in a structure such as VSAM, then the Access Method Services control statements can be used to find the identifiers and alternate access indexes to the file.

 

Table associations and cardinality. The rules that define relationships among tables are easy to extract from relational databases that use the "foreign key" construct in their SQL DDL. However, if this construct is not available, the integrity rules that enforce relationships among tables may be found in triggers, stored procedures, or application code. Likewise, for nonrelational databases, the rules that enforce relationships among data files is embedded in code.

Name matching can also be used to identify relationships between tables or files. This approach can be effective if the application's column and field names were developed according to a naming standard. Data elements (whether a column in a database or a field in a file or the program's private storage) with the same name and size are assumed to represent foreign keys. Ideally, your reverseengineering tool should highlight these potential foreign-key relationships.

The cardinality rules in a relational database will be expressed through the SQL INSERT, CHANGE, and DELETE rules. The comparable rules for nonrelational databases are enforced through application code.

Your reverse-engineering tool suite should be able to identify associations among the application's tables and files. It should also document how the rules were implemented (for example, through SQL constructs or application code).

 

Data element domain. The rules governing a data element's domain can be held in a number of different ways:

* Format statement (such as Cobol picture clause or column data type)

* Valid values (such as 88-level statements or a scan of the actual values held in the database)

* Initial values (such as value clause)

* Code that validates the data element (such as statements in the procedure division, triggers, or stored procedures).

88-level statements can be particularly useful in determining a data element's valid values. The 88-level statements provide insight into which values are of importance to the business process the program supports. These statements are used not only to identify the individual values that the data element can assume but also to identify sets of values that are treated as a group in controlling program behavior. However, the program's data structure definition may not contain an 88-level specification for all the element's valid values but only for those values of interest to the program. The full range of valid values may only be obtained by merging the 88 values from each program's specification for the data structure or through manual entry. A tool with name-tracing capabilities that identifies potentially disparate specifications for the same data can help determine the complete set of valid values.

 

Data dependency rules. Much of a program's logic exists to ensure the integrity of relationships between data elements. For example, in the following business rule, a dependency exists between the value set for the customer market segment (premier) and the average balance maintained in the customer's checking account: "A customer whose average checking balance is greater that $5,000 is automatically considered a premier customer."

In most cases, the application responsible for enforcing such business rules does so through application code. Your reverse-engineering tool suite should be able to identify this type of data dependency business rule. The tool should also extract how the application enforces the business rule.

 

Controlling business processes. Other business rules control how a business process executes. For example, the following business rule controls a Place New Order business process: "A customer with five unpaid orders outstanding cannot place a new order."

The order can not be placed if the customer's outstanding unpaid orders satisfy the "too many outstanding orders" limit. As with data dependency rules, support for these types of rules is usually embedded in an application's program code. Applications using relational databases may use triggers or stored procedures to implement such rules. Your reverse-engineering tool suite should recognize that these business rules exist in the application and provide documentation about how these rules are implemented.

 

DERIVING A CONCEPTUAL MODEL

One of the more valuable aspects of reverse engineering is its extraction of the conceptual model that the physical database supports. This process attempts to uncover the underlying business model that existed before the application's original data model was subjected to the normalization and database denormalization processes.

A reverse-engineering tool can assist in this effort by performing the following steps:

* Translating column, field, and table names into business terminology. This translation is possible if the names were constructed using a standard abbreviation list. The cryptic technology names can be reverse translated from the abbreviations to their corresponding business terms.

* Identifying disparate data names that refer to the same business concept. Some techniques that reverse-engineering tools use to accomplish this task include:

1. Matching names from data structure specifications for the same file by position within the record layout and format

2. Tracing data transfers from one data element name to another as the data is manipulated by the program.

* Transforming tables that represent the resolution of many-to-many relationships back into relationships with attributes, as appropriate.

 

CALCULATING COMPLEXITY METRICS

If your organization's objective is to migrate to processes consistent with a managed approach to the application portfolio, then your reverse-engineering tool must be able to calculate metrics to determine a program's complexity. For a Cobol program, examples of the types of metrics that will indicate complexity include the number of GO TO statements and entrance points, the levels of nesting used, the number of data elements defined in the program, and the number of data elements the program uses. By measuring each program's complexity, you can identify which ones are more prone to break when the application undergoes maintenance or enhancements. You can use these measurements to determine which programs to rewrite and which applications are ripe for reengineering. Likewise, complexity metrics can be used in assigning work to programmers: The higher the complexity rating, the more senior the programmer.

 

LET'S TALK TOOLS

The seven criteria I've discussed should serve as guidelines for choosing the best reverse-engineering tools for your needs. In my next column, we'll take the next step-looking at some specific tools.

Terry Moriarty, president of Inastrol, a San Francisco-based information management consultancy, specializes in customer relationship information and metadata management. Her common business models have been used as the basis of customer models for companies within the financial services, telecommunication, software/hardware technology manufacturing, and retail consumer product industries. You can reach her at terry@inastrol.com.