dbpd1.GIF (9633 bytes)T E R R Y      M 0 R I A R T Y

A review of four early tools for automating business rule extraction

Getting Your Business Rules Automatically

October, 1997

In the past columns, I have examined the concept of reverse engineering as a mechanism for extracting the business rules buried within our application portfolios. In "Mining for Metadata" (May 1997) 1 laid the foundation by identifying the types of projects that can benefit from reverse-engineering techniques. The follow-up article, "Reverse-Engineering Tools: How to Pick Thee (June 1997), provided criteria for evaluating products that automate a significant portion of the reverse-engineering efforts. This article completes the trilogy by examining four products that are representative of the types of reverse-engineering tools currently available.

These reverse-engineering products have been around for a long time. In the past, application programmers responsible for code maintenance were their target market. These tools provided a maintenance workbench to support impact analysis and code modifications and offered a program-centric view of the application. However, with the growing number of Year 2000 projects focused on fixing date fields across corporate application portfolios, reverseengineering products are now enjoying a renaissance. The new products provide facilities to inventory the application components and seek out the data elements that represent dates. Once these elements are found, these tools' code maintenance facilities let you add the necessary logic to correct the century problem. Likewise, these tools can pinpoint where you need to make changes in your data structures.

I'm interested in a data-centric approach to application reverse engineering that lets you find all the code touching a single data element. I want to understand its domain in terms of valid values and format, derivation rules, and any logic that can affect the values that element can assume. As business users become more sophisticated in how they use the data warehouse, they are demanding access to this type of metadata about a data element. In many cases, their questions have changed from "What does this data element mean?" to "Why does this specific element have this specific value for this specific entity instance?"

I recently had an experience that illustrates how important it is for such users to understand the business rules that their operational business environment enforces: I purchased a file cabinet for my office, but when it was delivered, we couldn't open the drawers because the lock was broken. The delivery person recorded "scratched and dented" as the reason I refused the delivery My immediate reaction was that anyone at this company wanting to know why furniture was being refused was probably getting an analysis erroneously skewed toward "scratched and dented." A curious business analyst will question why a large proportion of the "furniture refusal reason" is "scratched and dented." If all the relevant metadata for that element is available, the analyst may see that "scratched and dented" is the default value on the delivery-refusal transaction screen. Everyone in operations knows that this value will get through the business system without questions from management. And since it's faster to leave the default value in than to record the correct value for each delivery-refusal transaction, it's easy to see why the default might be such a popular option.

Providing this type of metadata (such as the fact that default values exist) to the knowledge worker through the data warehouse can be a daunting task. In many cases this data may not exist, and where it does exist, it probably hasn't been maintained over time. Finding the business rules buried in an application can be an expensive proposition, as the sizing parameters in Table 1 illustrate. I developed these parameters over the years by collecting statistics on the amount of time data analysts spend defining data for a number of projects. My hope is that reverse-engineering tools can significantly lower the sizing parameters for the last two categories.

9710-1.jpg (33101 bytes)

THE EVALUATION

The four products I looked at are Viasoft's Existing Systems Workbench (ESW) 4.2, ADPAC's PM/SS version C 305 M1, Microfocus Revolve 4.1, and Regenisys Assess:R 1.0. 1 didn't have my own, private Cobol application to use in evaluating these products. Consequently, my evaluations are based on demos and discussions with the vendors.

The scenario I used to frame the discussion with each vendor was taken from a requirement that I've encountered in a number of data warehouse projects:

Requirement: The business has provided a report that is currently being generated from the mainframe marketing-support system. It now wants to use the data to generate a report through the data warehouse.

For each field on the report, you need to backstep through the applications to find all the logic impacting that field, such as valid values, default value, conditional processing, and derivation rules. I wanted to assemble a complete specification package and the relevant code that shows:

* How the field is populated

          * Screens and data files from which the field is sourced

          * Other fields that the field is dependent upon and how they are manipulated independent of the programs containing the                 code.

I wanted the specifications to present the logic in execution order, not in the order in which the code appears in the programs. Finally, I wanted the ability to drill through to the program so that the actual code lines can be viewed.

EXECUTION ENVIRONMENT

PM/SS and ESW run in the mainframe environment. Their advantage is that the inventorying and parsing of the application's code execute in the environment where the code libraries reside. Revolve and Assess:R execute in a LAN or standalone PC environment, requiring that the code be downloaded into directories, which can be an extremely time consuming task. Several questions come into play when the execution environment is LAN- or workstation-based:

* Do you have storage capacity to accommodate the code for large applications?

* Does the tool provide facilities to enable the download?

* How transparent are those facilities to the analyst?

* Are the necessary files automatically downloaded when the analyst references them through the tool, or do they have to be downloaded before the tool can conduct analysis?

LAN-based products can take full advantage of the GLTI features data analysts have come to expect from their analysis tools. Revolve takes full advantage of those graphical capabilities through its interactive diagrams, and Regenisys gives you a Web-based solution by providing an HTML interface to the application's code. However, ESW and PM/SS prove that GUI can be implemented within the IBM text-oriented environment.

My ideal reverse-engineering product would employ a client/server architecture in which the extracting and parsing of the application occurs in the environment that holds the application's libraries but the presentation is done through a GUI front-end. The analysis of the application would be done in the environment that is most efficient and completely transparent to the people using the tool. None of the products reviewed here have adopted this type of architecture.

INVENTORY APPROACH

The first step in a reverse-engineering effort is to inventory the application components. A robust tool allows the inventory to start at different levels (such as at the job or program level) and find all the related components. For example, to inventory an entire application, the production job library serves as the source, in which case the tool should identify all the procedures, datasets, programs, databases, data elements, screens, and reports, as well as the relationships that exist between them. However, if only a single program is to be analyzed, the tool should be able to extract only the components that directly touch that program.

All four products are able to support both of these approaches. What distinguishes them is their treatment of missing components (anything that is referenced but doesn't have a corresponding entry in the library). In order to conduct its analysis, ESW requires that the program be compilable and the source for all copybooks be available. While PM/SS, Revolve, and ESW provide a report of missing components, only PM/SS provides the user with some level of control over how the tool behaves when it encounters a missing component. Revolve highlights missing components in its diagrams through special shading of the missing-component symbols.

RESULT PRESENTATION

The primary purpose of these reverseengineering tools is to provide analysts with an awareness of the application's components, structure, and interaction. Therefore, it is important to carefully evaluate the types of analysis that can be conducted and the presentation of analysis results. All provide a starter set of reports and diagrams. However, since none of the products anticipate my specific analysis and reporting needs, the option to define customized reports is very important. Only PM/SS and ESW provide a report writer through which users can specify their own reports. Microfocus stated that it provides consulting support for Revolve users to assist in customizing reports.

I also look at a tool's ability to interoperate outside of its own boundaries through import and export facilities. Specifically, I expect any tool that is the source or user of metadata to be able to integrate into my metadata management environment. The ability to interact with standard repositories-such as Plainum's Repository, Viasoft's Rochade, or Manager Software Product's Data Manager-is high on my criteria list. Likewise, I expect a tool to support industry metadata- sharing standards, such as CDIF or the Metadata Coalitions interface standard.

Both ESW and PM/SS can interface with the major repository products and can export their data to DB2. Neither Revolve nor Assess: R has this capability.

BUSINESS RULES EXTRACTED

Finally, I looked at each product's ability to meet my data-centric business rule extraction requirements. They all provide the ability to trace a data element through the application programs. They identify aliases for a given data element and can collect the corresponding information to a certain degree, None of them explicitly tells you what the entire set of valid values for a data element is, that the highlighted lines of code represent the algorithm for calculating the data element's value, or that a conditional statement provides the logic for determining how the application behaves to certain values of a given data element. However, they do provide you with most of the information you need to make these determinations yourself. In addition, they provide a notepad or annotation facility that you can use to collect related lines of code, the tool's interpretation of that code, and any personal notes. I used these facilities as the basis for documenting a data element's business rules.

I found Assess:R's decision-vector diagrams to be quite useful in understanding the conditions that impacted a program's behavior. While Assess:R organizes related paragraphs into decisionvector diagrams for the entire program, its companion product Extract:R lets you refine the diagram with a specific criterion. For example, Figure 1  shows the code for calculating the cost of a car sold in the state of Alabama. Figure 2  provides the associated decision vector.

9710-2.jpg (54940 bytes)

9710-3.jpg (48761 bytes)

There are facilities that I would like to see a reverse - engineering tool support that these program-oriented tools do not provide. For example, these products cannot reverse engineer databases, parse SQL Data Definition Language, or extract metadata from relational database catalogs. Therefore, they cannot extract business rules represented as identifiers, foreign key relationships, or optional and mandatory constraints. They can parse the structure of tables, views, and the associated columns by representing them as data structures to the Cobol application. But since they do not provide any graphical representation of the Cobol data structures, you must look for other products to support these needs. Most data modeling products provide the ability to reverse engineer databases. However, few parse Cobol data structures, with the notable exceptions of Popkin's System Architect and Cayenne's product suite. System Architect has a facility for importing Cobol definitions into its repository, but it doesn't automatically generate a diagram representing the data structure's organization. Cayenne does both.

While you can use a combination of tools to gain a total view of your organization's data, no facilities exist to easily integrate the program and database perspectives of the same data element.

To a certain extent, I am doing the program reverse-engineering tools an injustice by evaluating their ability to extract business rules from a data-centric perspective. They were designed to provide application awareness, but I was looking for data awareness. My objective is to understand the data, not change the application. So, while none of these tools presents the business rules for a data element in the format I was hoping for, they do provide mechanisms to drill through the application so that you can locate most of the related business rules. These products can assist data analysts in collecting and reporting a data element's business rules, but, for the most part, they won't tell you that those lines of code enforce a specific type of business rule. So it's left up to you to provide that judgment. However, given the choice between extracting the business rules from an application manually or using one of these reverse-engineering tools, I would go with the automated approach every time.

Terry Moriarty, president of Inastrol, a San Francisco-based information management consultancy, specializes in customer relationship information and metadata management. Her common business models have been used as the basis of customer models for companies within the financial services, telecommunication, software/hardware technology manufacturing, and retail consumer product industries. You can reach her at terry@inastrol.com.