Conceptual Data Modeling in an Object-Oriented Process
by Scot Becker

ABSTRACT
Object-Orientated (OO) software development processes (iterative and component driven) are being used more frequently in the industry. Each component iteration generally goes through the following phases: Analysis, Design, Construction, Verification, and Deployment. In an OO process, business rules tend to get captured during the analysis and design phases.

Ideally, the components are specified such that they are relatively independent of each other. In actuality, rigorous analysis of data and rule dependencies between components may reveal that some components are not truly independent of each other.

During the analysis phase, use cases are used to create an analysis class diagram that will then specify a component. Traditional use case analysis is often more narrative and less specific which results in use cases inherently weak in capturing business rules in a consistent manner. On the other hand, if a very robust analysis is done, use cases with an excruciating amount of detail can be produced. While many rules can be captured and documented in this manner, from a data perspective rules are not easily verified and many rules can be easily missed. Such OO analysis approaches also tend to be heavily design-centric, particularly at the application architecture level. Further, such use cases are often heavily biased towards UI and other process/implementation (and often, design) concerns. While this is important, necessary, and better than past software development processes, it is at a level of abstraction that is often lower than the conceptual level.

Typically, the analysis artifacts are then used as inputs into the design phase that will generate, among other artifacts, design class diagrams. However, OO processes are neither formal nor rigorous; success is dependent on the skills and experience of the analysts and designers as the resulting constraints are not easily verified. In addition, the OO class diagrams (from a data perspective) are inherently rooted at the logical level and not at the conceptual level.

In this paper, we'll introduce how Object-Role Modeling (ORM) can overcome the above weaknesses in the OO approach and actually improve the quality of the resulting artifacts.

1. INTRODUCTION
The trend towards using object oriented (OO) techniques to design and develop software is steadily increasing. This trend is often looked upon as the dawn of a new age where software will be easier to design, develop, maintain, and subsequently upgrade with new functionality. Others will contend that the OO way of looking at development is nothing new. This latter group of people will aver that the OO approach is simply the "waterfall" approach used in the previous decades with merely the addition of smaller waterfalls and a few "new" diagrams.

While this is a compelling -- and heated -- discussion, this paper will not attempt to resolve this debate. This paper will, however, try to address the same old issues that have always plagued software development: quality, accuracy, rigor, consistency, speed of software delivery, and user support.

While the OO approach, due to the nature of its many diagrams, tends to improve the resulting analysis and design artifacts as a whole, it is fundamentally lacking in a key area: data. The OO approach tends to be overly process centric. While this approach is results oriented, tends towards greater reuse of code, and yields an overall application consistency, it also neglects key issues centering on data integrity and quality. Thus, and it should come as no surprise, if the supporting data is bad, even the best user interface and middle tiers will not keep the software project alive.

This paper will reveal how to introduce a relatively old way of looking at data into this new way of looking at application analysis and design. As it turns out, this formal method of data modeling does not compete with nor detract from the benefits of an OO approach. Rather, this method actually improves the OO process. As supporting evidence, this paper was formulated after a year of applying these techniques at a major Minnesota manufacturing company with positive results.

2. A BRIEF OVERVIEW OF OBJECT-ORIENTED (OO) PROCESSES

2.1. COMPONENTS
The typical OO approach centers around two key concepts. The first concept is the notion of a component. A component is a portion of the application that is divided in such a way that each component is relatively independent of the next. Components bundle data and the processes acting upon that data in such that they yield greater reuse. The resulting application is then constructed in such a way that it merely calls the components when it needs the data and functionality which the components provide. Keeping like data and functionality centered upon the notion of a component tends towards greater reuse and smaller code that is easier to maintain. Further, a component can be improved upon (upgraded, or fixed) at any time with little interference to the surrounding application. As a final note, from a data perspective a component is usually divided amongst groups/clusters of related tables. From a process perspective, a component is usually centered on core workflows.

Components are typically determined by making rough-cut models of the system as a whole (the typical models used are covered in a later section). These models are not specific enough to begin design, but they identify key process flows and data clusters. In addition, these rough-cut models will aid in the identification of any architecture constraints that will need to be addressed as well as provide some idea of the scope of the project as a whole (which leads to better estimates of time, materials, etc.). It is in this manner that the resulting components are (presumably) correctly identified and divided such that they are independent. Further, in practice this is largely the case. However, the details of these models are not specific enough to ensure 100% encapsulation of the components. The technique addressed in the second part of this paper will ensure the components are (virtually -- not accounting for small amounts of error that will exist no matter how perfect your approach is) 100% encapsulated.

2.2. ITERATIONS
The second key concept of the OO approach is the notion of iterative development. Each version of each component will step through the phases of the iteration (details of the phases are covered in the following sections). In fact, it often happens that some part of the application is actually deployed while the remaining functionality of the application is still in the various phases of development. This produces an exciting and unusual phenomenon: the users get (part of) the application faster.

This component driven approach is different from the typical "waterfall" approach in which the entire application will move through each phase before proceeding to the next. In this manner, the (version of the) application is not delivered until it has passed through all phases. Further, it is difficult to begin work on subsequent versions of the application (with presumably increased functionality) until the preceding application versions are completed and delivered.

2.2.1. ITERATIONS PHASES
The phases of an iteration are generally: analysis, design, construction, verification, and deployment. Each preceding phase of an iteration is crucial to the success of the next. Without proper analysis, the resulting design will be incorrect. Without proper design, the resulting code will be incorrect. Without proper coding, the testing will fail, and so on.

2.2.1.1. ANALYSIS
The analysis phase of an iteration is used to determine the functionality, requirements, interface specifications, and business rules of the component. Typically, use cases and analysis class diagrams (detailed later) are constructed to elaborate the result of the analysis and verify those results with the users. The analysis phase is intended to merely state the requirements without having any design bias. This is good; in this manner analysis identifies the problem and the constraints such that any design that meets those constraints and provides a solution to the problem is right on target. If one thinks about it, why would the users need to worry about how the data is represented in the underlying tables and application classes, or whether the application is using .Net or J2EE architecture? The answer is simply that they don't need to know (nor do they care).

The analysis phase is completed once the design team has enough information to design the implementation of the component. It is also interesting to note that the analysis requirements map directly into verification tests (i.e. does the resulting application adhere to the constraints and rules specified by analysis and the users?).

2.2.1.2. DESIGN
The design phase of an iteration is the technical solution to the artifacts presented by the analysis. In the design phase, the details of the architecture, the overall system constraints (i.e. performance, compatibility, etc.), the user interface, the data structures, and overall application cohesiveness and consistency are fretted over. The design is completed once the implementation (construction) team has enough information to write the code to make the application work.

2.2.1.3. CONSTRUCTION

The construction phase of an iteration is the actual implementation of the design. If proper analysis and design was performed, the construction phase is relatively straightforward. The construction phase is complete once they have handed off enough code to begin verification.

2.2.1.4. VERIFICATION
The verification phase of an iteration is the testing of the requirements, constraints, business rules, and other specifications of a component against the actual component that was constructed. Verification is complete once the component can be deployed.

2.2.1.5. DEPLOYMENT
The deployment phase of an iteration is the actual use of the component. Deployment does not necessarily mean that the users have access to it. It may be deployed in such a manner that other components can interface to it or be a subset of the functionality of an overall application release.

2.2.1.6. ITERATING A COMPONENT
If -- at any phase -- a component is deemed to be fundamentally incorrect, the component is immediately "iterated" (the component is versioned and starts the iteration all over again beginning with analysis). Likewise, if the scope (functionality) of a component is altered, the component is simply versioned and begins a new iteration. Each iteration is divided such that subsequent component versions (or entirely independent components) can be worked on in a pipeline fashion (Figure 1); once a component version has completed a phase, the next component (or version of the same component) may enter that same phase. In this manner, the overall functionality of the application is delivered piece by piece.



Figure 1: How components move through an iteration.

2.3. TYPICAL USE OF DATA MODELING TECHNIQUES IN AN OO PROCESS
The OO process typically lacks in one important detail: persistence of the data. The data structures are -- by definition -- object oriented. If the target Database Management System (DBMS) is also object oriented, mapping the persistent application classes to persistent OODBMS classes is straightforward. However, if the target DBMS is a Relational Database Management System (RDBMS), as is presently the typical case, the mapping is not so straightforward. Because of this, the traditional data modeling techniques of creating a Logical Data Model (LDM) and Physical Data Model (PDM) are often employed. Because the data structures are defined by the design phase, and are needed for the construction phase, the LDM and PDM are typically defined during the handoff between design and construction. In addition, because mapping the OO classes to tables and columns specifies the persistence of the data, there is seldom much value in having an LDM differing in any significant way from a PDM. In other words, when data modeling is performed in this fashion, one is only mapping classes to tables. Therefore, the only usefulness of an LDM in this scenario is to use "Business" names rather than the physical names usually subject to length and abbreviation standards and constraints. In this manner, the typical deliverables of a data modeler for each component is the realization of the design in persistent data structures and some sort of mapping specifications that detail which member of which class maps to which column of which table. Thus, the data modeler has little impact on the quality and accuracy of the analysis and design.

The author wonders why an OO Process performed in this manner needs a skilled data architect at all; this task could be easily accomplished by someone who is able to read a class diagram and write the SQL on the target database. Further, the use of expensive Computer Aided Software Engineering (CASE) tools to generate the SQL seems rather moot. The author isn't the only one with this opinion; the maker of one of the most popular OO Design CASE tools recently released a set of UML stereotypes that simply express the persistent classes of a class diagram as "tables" and generates the SQL needed to implement them on the target platform. If the reader is expressing any hesitation with this concept, s/he is correct, but the author will address these points shortly.

2.4. OO ARTIFACTS (UML)
In continuing our overview of the OO Process approach, it is necessary to briefly describe the key OO artifacts produced during the analysis and design phases of an iteration. The wildly popular OO syntax (and it is a syntax, not a method) known as the Unified Modeling Language (UML) is often employed to graphically express the analysis and design artifacts.

2.4.1. USE CASES
Use cases are the UML's primary way of specifying -- during the analysis phase of an iteration -- the user's perspective of process, constraints, business rules, and other requirements. The core use case diagram is often called a context diagram. In this diagram, stick figures are used to represent actors (such user types/roles/classifications, external systems, etc.) and ovals represent the use case. Association lines are then drawn between the actors and the use cases they call/instantiate/use. Also note that use cases can use other use cases (via uses or extends stereotypes).

While the context diagram is useful for visualizing the behavior of the system as a whole, and is a brilliant representation of the reuse of the system pieces, the "devil is in the details": the use case document. The use case document usually contains all of the business rules, attributes, constraints, requirements, assumptions, and exceptions of the use case. As there may be more than one way to perform the same task, these variants are broken out into separate flows within the same use case. Then, for each flow, the actual path of behavior is usually mapped out in a table fashion with the actor's behavior on one side and the system behavior/response on the other. Attributes are usually generally specified (i.e. "the actor supplies their User ID and Password") and any constraints on those attributes - static or dynamic - are then textually specified. In order to limit repetition of the same requirements/constraints/business rules, Use Cases are often broken up such that a given rule is written once and used (called) often.

The biggest benefit to using use cases is that they are written in a natural language. Their format is usually easy to explain and read, and therefore users can easily validate them. Paradoxically, the biggest weakness of the use case is the fact that it is just a document. There is no formal method for gathering, specifying, or verifying the requirements. Inconsistency, inaccuracy, and vague rules can run rampant unless the analyst is extremely careful. This fact leads to two interesting results, which will be elaborated upon in a following section on the weakness of the OO approach.

2.4.2. CLASS DIAGRAMS
Class diagrams are employed to graphically express classes, their data structures (such as members and/or dependent classes), and the encapsulated process that acts upon the data (methods). Communication between classes (roughly analogous to data model relationships) is expressed via associations while subtypes are expressed via inheritance. From a pure data-oriented perspective, persistent class diagrams are roughly equivalent to Entity Relationship (ER) diagrams with methods and multi-valued attributes. In mapping persistent class diagrams to ER structures, one needs to account for other relationships types such as composition, aggregation, and dependencies, but that is really beyond the scope of this paper.

The most obvious difference between a class diagram and an ER model is the use of all sorts of non-persistent classes, interfaces, dependencies, and methods. These elements are crucial to proper application design and implementation of the chosen architecture, but utterly irrelevant to proper data design.

Because of their inherent detail, class diagrams are seldom fully detailed in the analysis phase. Thus, most of the specification of class diagrams is performed during the design phase.

2.4.3. OTHER UML DIAGRAMS
The UML contains many other diagram types useful for specifying various parts of the system. Interaction diagrams, for example, detail the behavior of and communication between classes. In fact, interaction diagrams are probably the third most commonly used UML diagram type. Other diagram types will specify states, deployment plans, architecture, and activity. Because this paper is concerned largely with the static constraints expressed by use cases and class diagrams, the author will not elaborate any further on these diagram types.

2.5. BENEFITS OF AN OO APPROACH
It is hopefully evident by now what the benefits of an OO approach are; they include: greater re-use, more detailed specification of system constraints and processes, easier communication and verification with the users/stakeholders, and a clearer distinction between what the problem is (analysis) and how the problem will be solved (design and construction). If one where to specify in one word what the biggest advantage of using the OO approach is, that word would be "process"; inversely, if one where to specify the biggest drawback to the OO approach, that word would be "data".

2.6. WEAKNESSES OF AN OO APPROACH (FROM A DATA PERSPECTIVE)
The largest weakness to the OO approach is its lack of rigor, verification and completeness when expressing static data constraints. The approach is largely process-centric and is usually implementation biased (focusing on how to solve the problem, not what the problem is).

2.6.1. CLASS DIAGRAMS
As previously discussed, class diagrams, when abstracted down to the persistent data storage layer, are little more than ER diagrams with behavior and multi-valued attributes. Because of this, they are subject to the same criticisms previously reserved for ER diagrams: lack of attribute-level rule specification, un-natural ways of expressing dependencies amongst data, tendency towards un-normalized schemas, inflexibility when it is time to change the system, and ease of introducing errors that are correct as far as the syntax is concerned. All of these criticisms can be addressed via traditional data modeling techniques (specifically, by using a technique that is not so concerned with the entity-attribute classification of system elements).

2.6.2. SPECIFYING COMPONENTS
The primary weakness in the OO process way of specifying components is that only "rough-cut" models are used. Once those components enter the analysis and design phases, one may discover that they are not so independent after all. This is largely due to incomplete models when specifying the components in the first place.

2.6.3. ANALYSIS PHASE
Due to the nature of the analysis artifacts, the analysis phase can be as vague or as elaborate as the analyst wishes. Further, since the details of the analysis are expressed as text, the analyst has the most flexibility of all the iteration phases in performing their tasks. This flexibility, however, can also pose some problems.

2.6.3.1. OVERLY VAGUE USE CASES
Use cases can be constructed such that they are overly vague. They may not contain any or all of the attributes needed by the system, they may not express any or all of the constraints on those attributes, and they may not describe all of the desired functionality of the system. Designers tend to like this sort of use case, by the way, in that design is free to implement the system as they need to; in other words, they are told what the problem is, not how to solve it. The author would counter, however, that if the designers do not have an accurate picture of what the system is supposed to do, any resulting design is inherently flawed.

2.6.3.2. OVERLOADED USE CASES
The converse of the above style of use cases is the "overloaded" use case. In such a use case, the analyst tries to express every single constraint, requirement, and business rule imaginable. Such overloaded use cases tend to include: user interface specifications, attribute data typing, interface requirements when sending attributes between systems, static attribute rules such as mandatory fields, subset, equality, and exclusion constraints, specification as to how the attribute is formatted on the user interface, etc. While this is indeed a noble effort, such use cases tend to attempt to express things that really aren't analysis concerns (again, what is the problem vs. how to solve the problem). Further, they tend to be inconsistent in how they express all of these numerous constraints and they lack the rigor in ensuring all constraints have indeed been documented and subsequently verified by the users.

2.6.3.3. DANGERS OF IMPLEMENTATION (OR DESIGN) CENTRIC ANALYSIS
By now, the dangers of the implementation or design centric approach should be obvious. Without using any rigor in expressing the underlying data structures and constraints, the application may be doomed to failure due to bad data that is otherwise acceptable to the system and the constraints that said system specifies.

2.6.4. LACK OF FORMAL TECHNIQUES
All of the criticisms previously elaborated can be overcome by the addition of a formal process as to how to specify the artifacts. To date, none (that the author is aware of) has been published from a purely OO standpoint. The UML does have a mechanism for expressing anything that its syntax cannot: the note. Specified as text, or in some syntax such as the Object Constraint Language (OCL), the note field can contain anything the modeler wishes to express. That is indeed good, but the question remains as to whether or not the analysts and designers realized those additional constraints where needed at all. The lack of a formal method to determine whether those rules do indeed exist means such errors are easy to introduce and propagate down to the design, construction, and verification phases.

3. A BRIEF OVERVIEW OF OBJECT-ROLE MODELING (ORM)
Now that we have fully explored what is meant by an OO process and have elaborated on what OO's benefits and problems are, it's time to discuss how to improve the process.

The technique I am proposing to improve the quality of the OO Process is that of Object-Role Modeling (ORM). ORM has been in use since the 1970's (about as long as the more common ER style of data modeling). It is not within the scope of this paper to fully explain the syntax of the ORM language. Rather, the author will attempt to explain the key concepts of the method in comparison to the OO process, and suggest how to incorporate ORM's features into that process. The reader is encouraged to explore the work in the reference section for more detailed information on ORM's syntax.

The main difference between ORM and ER is that ORM makes no distinction as to whether a model element is an entity or an attribute - they are just objects who play roles with other objects (associations that are also known as "facts"). In this manner, constraints can be expressed freely and easily across the objects and roles that they play. In addition, ORM is expressed in a completely natural language (for example, English sentences). Thus, ORM facts are readily extracted from and verified by the users. Further, ORM makes use of "data use cases" which include real data into the model for easy verification of structures and constraints. In fact, with the aid of CASE tools, most of the constraints can be completely derived and verified from a significant set of sample data.

When an ER style abstraction of the model is helpful (and at various stages of the process, the more succinct ER notation has many benefits), an ER model may be completely derived from the ORM model. In this manner, the derived ER model may be easily compared to persistent class diagrams, but this topic will be discussed in greater detail later.

ORM is defined by discovering easily verbalized facts (for example, as recited by the user) about the Universe of Discourse (UoD, also known as the system domain, area of interest, etc.). For example, consider the fact: "the Movie named 'Pulp Fiction' received the Rating described by 'R - Restricted' in the Country identified by the name 'United States'". The capitalized words are the object types, the qualifiers (such as "named") are the reference modes of the object types (i.e. how do you distinguish one movie from another?), and the remainder of the sentence is the predicate, which indicates the roles the object types play. Thus, the above fact instance can be generalized into the fact type: "Movie(name) received Rating(description) in Country(name)". The predicate of this sentence is then "· received · in ·" where the ellipses (·) indicate the "object holes" in a "mixfix" notation. In the mixfix notation, the sentence may be rearranged such that the objects may appear in any order. Alternate readings are often referred to as "inverse" or "alternate" readings. Note that because an object type may appear in any order within the corresponding predicate, this mixfix notation works for any language (which may place verbs at the end of the sentence, for example).

The number of roles in a fact type is known as the "arity" of the fact (in the case of the above fact type, the arity is three). Note that in ORM, facts may be of any arity (not so in many styles of ER which mandate binary relationships) as long as the fact is "elementary" (also said to be "atomic", which means it cannot be broken down into smaller facts without some information loss). The above fact is elementary and we cannot express it in any "smaller" way without information loss. For example, "Movie(name) received Rating(description)" would lose the information that a movie is released in many countries and that each country may have a different rating system. An example of a fact that is not elementary would be "the Movie named 'Pulp Fiction" starred the Person named 'John Travolta' and was directed by the Person named 'Quetin Tarnatino'". In this case, the use of the word "and" is a giveaway that the fact may be divided into the fact types: "Movie(name) starred Person(name)" and "Movie(name) was directed by Person(name)" without any loss of information. Also note that a fact may have an arity of one (a "unary" fact) such as "the Person named 'AJ Durham' is eccentric". Such unary facts are often implemented as Boolean logical fields (true or false, the person either plays the role or they don't)

Due to the nature of attribute-free models and since facts may be of any arity, it is easy to express virtually any style of static constraint upon the model. Such constraints are of the following types:

Mandatory - the object must play this role

Mandatory disjunction - the object must play one of a series of roles (but it can also play more than one)

Uniqueness - a specific object (or combination of objects) may appear only once in a set

Subset - if an object plays role x, it must also play role y

Equality - if an object plays role x, it must also play role y, and vice versa

Exclusion - an object either plays role x or role y, but not both

Frequency - an object plays a role a minimum of x times and/or a maximum of y times

Value - allowable values of an object type, for example, the object type "Sex(code)" may have allowable values "M" and "F"

These orthogonal constraint types may also be combined to form more complex constraints. For example, an exclusion constraint may be combined with a mandatory disjunction to specify that an object must either play role x or role y but not both. As another example, consider the fact types: "Person(name) works for Department(name)", "Department(name) manages Project(TLA)", and "Person(name) works on Project(TLA)". The constraint that a person may only work on projects that are managed by his/her department is a merely a complex instance of the ORM subset constraint. This constraint is not possible in any variation of ER modeling or in the UML class diagram syntax, for example, without introducing intermediate (and unnatural) structures.

As a side note, ORM also has specifications for subtypes (inheritance) and various ring constraints (applied when an object plays a role with itself, for example, "Person(name) reports to Person(name)").

In addition to all of the benefits of ORM's syntax, it is performed by using a rigorous and complete process known as the Conceptual Schema Design Procedure [Halpin].

3.1. THE CONCEPTUAL SCHEMA DESIGN PROCEDURE (CSDP)
The Conceptual Schema Design Procedure (CSDP) is a series of steps that encompass verbalization, application of constraints, model validation, specialization and generalization, and various model transformations. The steps of the CSDP are:

1) Transform familiar information examples into elementary facts, and apply quality checks.
 
2) Draw the fact types, and apply a population check.

3) Check for entity types that should be combined, and note any arithmetic derivations.

4) Add uniqueness Constraints, and check arity of fact types.

5) Add mandatory role constraints, and check for logical derivations.

6) Add value, set comparison, and subtyping constraints.

7) Add other constraints and perform final checks.

Correctly applying the steps of the CSDP (the details of which are outside of the scope of this article) ensure that the resulting model is correct, validated, consistent, verified by the users, and populated with sample data that conforms to the constraints. As an added benefit, the resulting model is also fully normalized  as the elementary nature of the facts finds all functional dependencies and thus can derive a fully normalized logical model.

3.2. ORM COMPARED TO CLASS DIAGRAMS
ORM's attribute-free nature overcomes many of the problems previously identified with class diagrams. Thus, the main differences between ORM and class diagrams are:

ORM contains no attributes or entities, only objects playing roles

ORM can specify "attribute" level constraints that cannot be expressed by the native class diagram syntax

Class diagrams contain behavior (methods)

ORM, when properly applied, is virtually free of errors (in normalization and/or uniqueness for example) that can easily creep into class diagrams

The persistent class diagram can be completely derived from the ORM model

3.3. ORM COMPARED TO USE CASES
Use cases are best at defining dynamic constraints and other process-specific concerns, while ORM adequately defines the data structures and the static constraints that apply to the data population. In practice, one finds that in some areas use cases and ORM may capture the same rules. But this overlap will be addressed in the section on incorporating ORM into the OO Process. Both methods are similar in that they rely on the use of a natural language that is easily expressed and verified by the users.

3.4. BENEFITS AND WEAKNESSES OF ORM
This brief overview of ORM is concluded by addressing ORM's strengths and weaknesses (as compared to the OO Process). The strengths of ORM (which happen to compliment OO's weaknesses) are: ORM uses a formal rigorous process, it is easy to verify via data use cases and a natural language, ORM contains additional "attribute level" constraints, ORM can easily be expressed in many model transformations, ORM is a natural way to express facts and their constraints, ORM can be used for both a "problem centric" analysis and a "solution centric" design, and ORM contains the significant amount of detail that is needed by design but is not needed by analysis (data types, allowable values, other data-centric concerns, etc.).

The weakness of ORM (from an OO perspective) would be that it inherently does not consider process when defining the data structures. It is the opinion of the author that defining the process that acts upon the data is important. However, unless the process defines the static rules that apply to the set of data instances, process has no bearing on what the correct persistent data structure should be (other implementation constraints, such as performance, may indeed have an impact on the persistent data structure, but that is a different issue than process vs. data).

To reiterate: process is important, and OO artifacts do a fine job at capturing those processes. The author agrees that they should be used to specify dynamic behavior, architecture considerations, and other implementation concerns. As we will see in the next section, ORM can be used in tandem with OO techniques and without sacrificing any of OO strengths but rather by complimenting OO's weaknesses resulting in a better system process overall.

4. INTEGRATING ORM INTO OO PROCESSES
This section will illustrate how to insert the ORM technique into the OO Process to maximize the quality of the artifacts (and indeed the system as whole). This section will conclude with some evidence of this combination of techniques and the benefits reaped from using this technique at a major manufacturing company.

4.1. ORM'S ROLE IN DETERMINING COMPONENTS
Recall that components are discovered in the very early stages of the project by the creation of rough-cut models used to isolate attributes and processes into relatively independent groupings. Further recall that the use of generalized use cases and simple class diagrams, as typically applied during component analysis, does not always achieve the goal of fully encapsulated components.

Using ORM at the component definition stage along with the rough-cut class diagrams and use cases will improve the accuracy of the initial component divisions. ORM can be used to quickly model the important object types and the roles that they play. Often, and in a similar manner, the rough-cut class diagram will attempt to accomplish the same goal. When both the rough cut ORM model and class diagrams are completed, one can simply derive a logical schema from the ORM schema and compare it to the persistent class diagrams. If both models agree, then components can be easily divided out (after considering behavior described by the use cases, of course). If they do not agree, one model may have discovered a dependency that the other did not (and the author would wager that it was the use of ORM that discovered the dependency).

4.2. ORM IN TANDEM WITH THE OO APPROACH
Now that the components are identified and divided out, the project team is ready to begin to run the components through the iterations. As such, ORM has a direct impact to three of the phases of each iteration: Analysis, Design, and Construction. It is also useful to note that since each phase of the iteration feeds the next, ORM has an indirect impact on the subsequent iterations (rules and constraints documented in the ORM model, for example, should become test requirements during the verification phase).

4.2.1. ORM IN THE ANALYSIS PHASE
ORM is best applied during the analysis phase in tandem with the use case analysis. The ORM analyst should team up with the use case analyst while conducting requirements gathering sessions with the user community and subject matter experts. The ORM analyst would then capture the data constraints that ORM expresses, and the use case analyst would document the dynamic constraints and general processes.

Once both the use cases and ORM schema have been validated and verified by the users and subject matter experts, the rules, constraints, and other requirements gathered may be grouped together (in a requirements documentation tool, for example) and handed off to the design team. It may also be useful for the ORM analyst to derive a logical schema that can be used as an initial persistent class model by the design team.

4.2.2. ORM IN THE DESIGN PHASE
During the design phase, it is best for the ORM analyst to leave the design team alone while they work out the class diagrams and implementation concerns and generate the class and interaction diagrams such that they meet the requirements specified by the analysis artifacts. Once the design team has completed their models, it is useful to compare their results with the initial use cases and ORM schema to ensure all constraints and requirements have been met.

During this comparison, one often discovers that the persistent class model differs from the logical model derived from the ORM. This may occur for several reasons:

The designer, because of other constraints, generalized the functionality but still met the requirements (for example, created a generalized "data driven" structure who's persistent model looks quite different than the specialized ORM derived logical model). In this case the ORM model may need to be adjusted to correspond with the design model. Remember that there are many ways to model a situation, and transformations can and do need to be performed. Further remember that the initial ORM model illustrated an analysis view of the world (what the problem is) and the design view of the world (how to solve the problem) may look different. Those differences must then be agreed upon and implemented. In this manner, it is likely that two ORM schemas will exist: one for the analysis view, and one for the design (and implementation) view. This difference is nothing new to data analysts who often construct "conceptual" models, "logical" models, and "physical" models to account for needed changes between the perspectives.

The designer omitted some constraints or requirements. In this case, the design needs to be adjusted such that it entails the missing requirements.

The designer, during the process of specifying the design, discovered requirements that the analysts missed (or the requirements were added on after the analysis phase was completed). When this happens, the added functionality is either stripped from the design (and possibly noted for a later iteration of the component) or added to (synchronized with) the analysis artifacts.

Note that in this manner, there exists a "checks and balances" system between the designers and the analysts. The author has found such teamwork to be very healthy for project success.

4.2.3. ORM IN THE CONSTRUCTION PHASE
ORM's role in the construction phase, particularly with the aid of CASE tools, is fairly trivial. The design oriented ORM schema can be used to derive the physical structure of the database schema that the application will be constructed against. The analysis ORM schema may be a useful input to the construction team members if they desire any sort of knowledge as to why the design is the way it is (i.e. what prompted the design structures).

Note that typically during the construction phase, the ORM analyst - usually being one of the few data folks on the team - is also tasked with physical database implementation concerns like abbreviations, naming conventions, and standardizing the resulting schema into meta-data repositories and the like.

4.3. CASE STUDY RESULTS
As mentioned before, this paper is the result of applying the techniques contained herewith at a major Minnesota manufacturing company. When the author first joined the project team (which, incidentally, consisted of very skilled OO designers and analysts), the data modeling activity was centered on generating physical schema from design's persistent class diagrams; mostly with some success but a couple of notable failures. Over time, the author worked more and more towards the analysis phase and introduced the techniques contained in this paper. As a result, the author is pleased to report the following results:

Not only were more business rules were captured, but also they were more precise and the users easily verified them.

Components moved more quickly through the analysis phase. Further, because the analysis was more precise, there was also noticeable speed up in the design phases, and thus the subsequent phases as well.

Use cases could focus on the general user requirements and process, not on the data model details (such as attributes, typing, constraints, etc.); this broadened the scope of the analysis effort and also resulted in a clearer set of requirements and user preferences without sacrificing any detail.

Users could communicate just as easily with ORM's natural language based modeling as they could with the narrative format of use cases.

The process of verifying the resulting logical data design against the use cases and user requirements was practically trivial.

The ORM analyst made an efficient liaison between the analysts, designers, and the users.

The resulting data schemas were more likely to be robust and correct.

The LDM derived from the analysis ORM diagram made an efficient and effective input into the design process.

Using CASE tools and leveraging ORM's natural language, importing requirements into the company's requirements gathering software could be easily automated.

5. CONCLUSION
This paper detailed the use of an OO process in the practice of software development along with its inherent weaknesses. In addition, this paper briefly covered a rigorous and formal data modeling technique and how it can be incorporated into the OO process with improved results as illustrated by the previous case study.

Thus, it has been demonstrated that the use of ORM in an OO process greatly improves the quality of the resulting analysis and design artifacts and speeds up the overall software development process and better ensures quality assurance. In summary, it is important to reiterate that the use of ORM in an OO process does not detract from nor sacrifice any of the many benefits of using an OO process in the first place. ORM simply counters OO weaknesses while working in tandem to analyze, design, and construct the desired application. The result is a better system.

6. REFERENCES
Halpin, Terry, Information Modeling and Relational Databases, 2001, Morgan Kaufmann (review) (purchase)


This revised article was the basis of a user group lecture, the basis of a presentation at the 2001 DAMA International Symposium, and also appeared as a two-part series in the Journal of Conceptual Modeling, Issue 18 and Issue 19.