For the past few weeks, the students have taken over lectures in the software engineering course. My lecture [Note: from a few weeks ago] covered four chapters on object-relational mapping from Patterns of Enterprise Application Architecture. Here are the slides for anyone who is interested. Unfortunately, the off-campus student portal is not open to the public, so I can not link to the video of the lecture.
When you really think about them, all O/R mapping approaches seem like dirty hacks. It makes sense to map classes to tables, objects to rows, and entities to primary keys, but this simple (naive?) correspondence breaks as soon as one adds complex object relationships and inheritance. To get around the fundamental object-relational impedance mismatch, one can abandon an object-oriented domain model (usually not a good idea), use an object database management system (not very widespread), or use one of several inheritance mapping patterns (this is where it gets interesting).
The three patterns explained in PoEAA are the following:
- Single Table Inheritance
- Compress the entire hierarchy into a single table.
- Class Table Inheritance
- Have one table for each class in the hierarchy.
- Concrete Table Inheritance
- Have one table for each concrete class in the hierarchy.
Within the application, inheritance mappers corresponding to individual domain objects convert the database representations into domain objects. I will not go into the inheritance mapper implementation or the relative advantages and disadvantages of the three mapping schemes here. The slides, book, and several websites discuss the ideas in much more detail. However, I would like to point out these interesting rules of thumb that came up during the class discussion:
- Single table inheritance is good for when subclasses have few fields.
- Class table inheritance is good for when both super- and subclasses have many fields.
- Concrete table inheritance is good for when superclasses have few fields.
I personally prefer to use class table inheritance since it is the easiest to modify and one does not need to pass raw database data up and down an inheritance mapper hierarchy. It is frustrating to see "authoritative" programming manuals advocate the use of data rows, data tables, data sets, and other database-specific representations for data transfer in database-driven applications. All of these representations are basically hash tables that are completely opaque to compile-time checks. When one changes a single database operation, it can silently break huge swaths of the application that depend on the result.
This is a huge problem in ASP.NET due to the widespread practice of binding web controls to a DataTable or DataSet and the difficulty in writing unit tests for web pages. The better alternative is to constrain the DataTable/Set/Row to a small number of inheritance mappers (ideally, just one) and pass around domain objects that are easy to verify with compile-time checks and unit tests.
I find O/R mapping patterns—and design patterns in general—interesting because when done right they can be an elegant solution to what can easily become a nasty problem.
The Dad Says:
It should be noted that class table inheritance is the preferred methodology, regardless of the super- or sub-class field footprint. The differentiator is largely one of cost or convenience (what is the gain for the additional work over an easier method) as compared to what should be proper programming architecture regardless of effort. For a given project, the developer has little knowledge of the design of a mature product. Thus, falling for single or concrete class inheritance eventually requires costly class rewrites.
The issue of how to represent data within class structures are obviously opposite nouns in the context of abstraction and class interfaces. Couple that with the observation presented about ASP.Net, where direct binding of data tables to user interface elements (which, by the way, was presented by Microsoft as a best practice and is still taught in most academic presentations of ASP.Net), it is no wonder that many software architects still code to a direct correlation between a data table and a their class/object structures.
Until one is willing to accept that DOMAINS and DATA are not the same, the notion of a possible fourth architecture tier that encapsulates the 1:1 data tier/database relation from the business tier object seems unplausible. But then again, those that know me are already aware of my belief that three tiers simply don’t suffice (see facade vs. presentation tier, or, is 32 speeds really enough).
Pausing for a moment, it seems as if this dialog is placing the burden on the developer rather than the DBA. I submit, the problem has manifested itself from too may developers creating the data architecture, and like our famous lemmings, proceed to build one hundred twenty two data tables that happen to mirror their favorite business domain objects. I had best stop before I start talking in circles…..
….but speaking of circles, at what point does the lecturer become the student….
Thanks!