|
|
An Argument for the Use of ER Modeling
by Scot Becker
A RAMBLING INTRODUCTION
The other day I was doing a bunch of repetitive tasks in an Entity-Relationship (ER) model -- this time, I'll be nice and not mention the (shoddy) tool I was using, but I will say that this is the only time in recent memory where I haven't trashed Designer. You can probably guess what I was doing: making sure multiple instances of the same attribute had the same physical name, checking name consistency, checking datatype and domain consistency, putting in audit columns on each table, tapping the space bar to the beat of Carl Weathersby, etc.
Being extremely bored with all of my cutting and pasting, my mind started to wander a bit (well, more than usual, anyway...). Various things flashed in my mind over the course of the hour or so I was plodding along and then·
I thought of calculus.
You see, I have a friend who - after apparently deciding that he hasn't given nearly enough of his time and money in pursuit of a college education - has entered a graduate program. Pursuant to his program, he has to first clean up a few "prerequisites" (read: 'thinly disguised enrollment fees'), calculus being one of them. Being a shell-shocked veteran of six mind-numbing quarters of calculus myself, I sympathized with my friend. And then it struck me:
I was back in calculus.
In calculus, the root of the technique - of lack of a better word - is the derivative. There are a couple ways of determining a derivative of a function. One involves plugging the function into another function and solving it. While this sounds easy enough in principle, for more complicated equations it can be an algebraic nightmare, as you have to perform all sorts of mechanics to arrive at the solution (the derivative of the function). Back when I was in calculus, this is where I always screwed up: never at the calculus-oriented topics, always on the algebra and arithmetic. Anyway, the whole process is time consuming, rooted in the mechanics, error prone, and - most of all - extremely annoying.
(Anyone see where I'm going with this yet?)
After a week or so -- and an exam, of course -- of finding the derivatives in this manner, your calculus instructor will play a cruel joke: s/he'll teach you the easy way to get at the same answer. This new method (the 'chain rule' for those of you who are 'real' geeks) is fast, accurate, easy, and uses simple mechanics. You are (at first) delighted and yet astounded by how something that had seemed so hard was, in fact, very easy. However, your astonishment quickly turns to agitation as you realize you had been wasting so much time and effort on the 'hard way'.
Someone (probably some cocky long-haired punk who always sits in the back row; like my friend) in your class is bound to ask the instructor just why derivatives are taught in this manner. The standard response is that the only way you will ever truly understand calculus is to know the hard way of finding a derivative.
If you're anything like me, you probably thought that was a silly justification.
(In this case, a silly justification of the 'I-had-to-learn-it-the-hard-way-and-so-will-you' law of academics, but I digress.)
(Heh... he says as if this article was anything more than one long digression anyway....)
For those of you just skimming my ramblings about higher mathematics: I'm about to come to my point.
THE POINT
ER modeling is just like that first method; it's time consuming, rooted in the mechanics (tediously cutting and repeated physical names, definitions, abbreviations, etc.), error prone, and annoying. If you are like most data modelers and architects out there, this is what you use on a daily basis, and have for quite some time. Because of this, you probably have elaborate, inconsistent documentation standards to -- ironically -- ensure consistency, data dictionaries so large you could use them to prop up a car axle, and padded project schedules to allow for a whole lot of copying and pasting.
And then I come along, and inform you of -- nay, taunt you with -- an easier way.
When using Object-Role Modeling (ORM), you define an object just once.
Not once every time you want to use it
...
... once.
In that single object, you define its semantic domain (datatype, allowable values, and etc.), its physical name, abbreviation standards and all sorts of other naming issues, its definitions, its notes, and anything else you need to say about that object.
Then you place that object wherever you want ...
... as many times as you want.
If you need to change anything about that object (say for example that the physical datatype now needs to be CHAR(2) instead of CHAR(1)), you just change it in one spot, and, utilizing the concept of inheritance, the change propagates to all instances of that object.
Agitated yet?
Hang on, it gets worse.
In ER, now that you have copied and pasted the definition for "timestamp" and other oft-repeated columns for the thousandth time, you're still not done.
Nope, now you probably have to normalize your schema (more mechanics!). You see, there is nothing in ER that captures the higher level normal forms such as BCNF, 4NF and 5NF (see the references for more info on normalization and ORM).
In ORM, you get a normalized database at no extra charge. ORM's use of the elementary fact insures no functional dependencies will violate normalization (1NF, 2NF, 3NF, BCNF, EKNF, and 4NF). And ORM's rich constraint implementation will allow you to capture many (albeit, not all) constraints typically considered to be 5NF considerations (semantic rules) by ER modelers.
In fact, you probably won't even think once about normalization while you are designing your database; you'll probably be too busy extolling the virtues of (not to mention boring the daylights out of your cubicle neighbors with) the fact that you don't have to manually change each conceptual occurrence of "DESCRIPTION" to "desc" on the physical side. Then, when you are generating your logical/physical diagram to see how your naming conventions simply fall into place, you'll probably stare in disbelief (with the expression of many a calculus student: jaw agape and a slight trickle of drool) at your fully normalized schema.
Liberated from the mind-numbing act of repeated attribute consistency checking (read: 'Did I update ALL occurrences of timestamp?'), and from the incredibly confusing act of determining multi-valued functional dependencies in tuples (read: 'banging-your-head-against-a-wall'), you'll wonder why you ever drew a rectangle in the first place.
You'll never want to use that hard way again.
THE CONCLUSION
But, continue to use ER anyway; it's the only way you'll ever truly understand database modeling.
REFERENCES
Normalization and ORM
Data Schema Normalization
This is a revised version of an article originally published in Issue 10 of the Journal of Conceptual Modeling.
|
|
 |