I have been using as part of some recent research. Spoon allows one to analyze and transform Java source code using a very elegant abstract syntax tree library and several parsing and compiling tools. Spoon assumes that one writes "processors" that act on all occurrences of a particular program element prior to compilation and execution. I required a different approach for my project. Rather than transforming code prior to compilation, I needed to load classes, modify their source code, recompile the changes, and execute the modified code at runtime. I found that there is very little online documentation describing this usage, so I posted my solution .
Reuseless
, my research advisor, introduced me to the following excellent term during a software architecture conversation:
- Reuseless
- (adj.) generalized or over-designed to the point of uselessness. The quality exhibited by software systems that have been coded for reuse where none is needed. See also: and .
I love the word and will try to work it into conversation as soon as I can. It distills the ideas behind "do the simplest thing that works" advocated by extreme programming practitioners and reinforces the programming adage, "make it work then make it right then make it fast," focusing on generality rather than speed.
Ant Mistake
I learned something interesting about while working on a class project last Friday. The following build task:
<delete> <fileset dir="." includes="*.class" /> </delete>
Is fundamentally different from:
<delete dir="."> <fileset includes="*.class" /> </delete>
Can you guess why? The former deletes the excess .class files left in the current directory after a build. The latter deletes my entire project. The project I had spent two days working on. The project that was due in six hours.
A chill passed through my body and small beads of sweat broke out on my forehead. I am sure every programmer has felt that deep sense of dread at some point in his or her career.
But wait! I still had open. Relief washed over me as I found the files intact in the local history. I saved the source files to a directory named "rescue" and began rebuilding the project.
Rookie mistake, I know. The even provides a warning. To my credit, I never willingly put class files in the root of a project (that's what the bin directory is for), but this particular time I had to use a tool that spewed class files all over the place when it was run.
I hope this post will serve as a warning to anyone who reads it.
Reflective Class Search Pattern
One of the main projects in last semester's software engineering course was writing a design pattern. My pattern, which discusses ways of using reflection to search for classes that satisfy certain criteria without initializing explicit references to instances of the classes, is called Reflective Class Search.
You can find earlier drafts of students' patterns—including my own— . They may not stay up for long now that the semester is over.
Object-Relational Mapping Patterns
For the past few weeks, the students have taken over lectures in the . My lecture [Note: from a few weeks ago] covered four chapters on from . Here are the slides for anyone who is interested. Unfortunately, the off-campus student portal is not open to the public, so I can not link to the video of the lecture.
When you really think about them, all O/R mapping approaches seem like dirty hacks. It makes sense to map classes to tables, objects to rows, and entities to primary keys, but this simple (naive?) correspondence breaks as soon as one adds complex object relationships and inheritance. To get around the fundamental , one can abandon an object-oriented domain model (usually not a good idea), use an object database management system (not very widespread), or use one of several inheritance mapping patterns (this is where it gets interesting).
The three patterns explained in PoEAA are the following:
- Single Table Inheritance
- Compress the entire hierarchy into a single table.
- Class Table Inheritance
- Have one table for each class in the hierarchy.
- Concrete Table Inheritance
- Have one table for each concrete class in the hierarchy.
Within the application, inheritance mappers corresponding to individual domain objects convert the database representations into domain objects. I will not go into the inheritance mapper implementation or the relative advantages and disadvantages of the three mapping schemes here. The slides, book, and discuss the ideas in much more detail. However, I would like to point out these interesting rules of thumb that came up during the class discussion:
- Single table inheritance is good for when subclasses have few fields.
- Class table inheritance is good for when both super- and subclasses have many fields.
- Concrete table inheritance is good for when superclasses have few fields.
I personally prefer to use class table inheritance since it is the easiest to modify and one does not need to pass raw database data up and down an inheritance mapper hierarchy. It is frustrating to see "authoritative" programming manuals advocate the use of data rows, data tables, data sets, and other database-specific representations for data transfer in database-driven applications. All of these representations are basically hash tables that are completely opaque to compile-time checks. When one changes a single database operation, it can silently break huge swaths of the application that depend on the result.
This is a huge problem in ASP.NET due to the widespread practice of binding to a or and the difficulty in writing unit tests for web pages. The better alternative is to constrain the DataTable/Set/Row to a small number of inheritance mappers (ideally, just one) and pass around domain objects that are easy to verify with compile-time checks and unit tests.
I find O/R mapping patterns—and design patterns in general—interesting because when done right they can be an elegant solution to what can easily become a nasty problem.
Software Shakespeare?
In yesterday's software engineering class, Professor Johnson, three other students, and I put on a play that illustrated several of the patterns described in chapter 14 of Domain-Driven Design. Not only was the play an interesting break from normal lectures, but it also illustrated especially well how the software implementation and business organization affect each other.
Professor Johnson played the software analyst who was trying to track down some problems in a large shipping application. Each student played a group leader responsible for a module in the system. I played the group leader responsible for the work order module. In each scene, Professor Johnson would "interview" the group leaders, asking how their modules worked and how they related to the rest of the system. The following lists the patterns described in the book and how they appeared in the play's software system.
- Continuous Integration
- In which all programmers within a software unit— or "bounded context", to use the book's terminology— work very closely to combine code and tests frequently during development. This pattern describes the approach used by all the teams in the play to some extent, but the team whose software had an extensive unit test suite and a well-defined bounded context exhibited continuous integration most strongly.
- Shared Kernel
- In which two or more bounded contexts rely on a core set of components. The two teams whose bounded contexts overlapped would have benefited greatly from following this pattern. Instead, they shared objects in an ad-hoc manner, which caused problems elsewhere in the system. Here, a change to the software organization would have probably prompted a corresponding change in the business organization.
- Anticorruption Layer
- In which a bounded context has an isolated layer that adapts an external bounded context to the domain model. Several teams exhibited this pattern. It naturally extends the and design patterns from objects to architectures.
- Customer/Supplier Development
- In which a "downstream" bounded context in the customer role depends on the "upstream" bounded context in the supplier role and in which the supplier takes the customer's needs into account. In the play, one of the group leaders expressed disappointment that his team's change requests took a very long time for another group to implement. In this case, it would have been beneficial to formalize the business relationship to echo the customer/supplier relationship present in the software.
- Conformist
- In which one bounded context is completely dependent on and has little affect on another's implementation. The team whose software depended heavily on several external APIs exhibited this pattern.
- Separate Ways
- In which two bounded contexts are completely independent of one another. This pattern was not exhibited explicitly in the play, probably because it was exploring a highly-connected application. However, one can easily see the parallels between a business unit and software module. Certainly two business units with no interaction whatsoever cannot build software systems that depend on each other.
Like the extensive examples in the book, the play illustrated very well how software organization and business organization affect each other. The problems found in the software described in the play were largely due to problems present in the business organization. One team was making changes that conflicted with another team's assumptions about the domain model. Communication problems prevented the conflict from becoming explicit.
Domain-Driven Design
Note: I started writing this entry about a week and a half ago, but did not have a chance to finish and post it until today.
Last week I started reading for my software engineering course. The book discusses common patterns that appear when designing and implementing . Like other patterns books, it does not present any mindblowing new ideas, but instead catalogs common problems and their widely-accepted solutions.
Eric Evans, DDD's author, begins the first chapter by describing a very familiar process: he meets with the client's domain experts and based on their description of the problem, starts drawing boxes and arrows on a whiteboard. After several meetings and many revisions, a domain model starts to emerge. Evans calls this "knowledge crunching" and defines it as follows:
[Domain modelers] take a torrent of information and probe for the relevant trickle. They try one organizing idea after another, searching for the simple view that makes sense of the mass. Many models are tried and rejected or transformed. Success comes in an emerging set of abstract concepts that makes sense of all the detail. This distillation is a rigorous expression of the particular knowledge that has been found most relevant.
I have drawn all sorts of box-and-line diagrams on dozens of whiteboards, but one particular knowledge crunching session sticks in my mind. I was meeting with a domain expert early one workday. I knew very little about the area we needed to discuss, and he knew only how he wanted the user interface of the application to behave. Neither of us had a domain model in mind. After some discussion, I started drawing the UI screens on the whiteboard. The links between the boxes became user actions. With a handful of screens on the board, we were able to "find the nouns" that helped define the functional units of the domain model.
Part two of the book (chapters four through six) discuss the organization of the domain model. It seems that every programmer has his or her pet idea of what a "layered architecture" contains, and Eric Evans is no different. He has the "infrastructure layer" at the bottom, followed by the "domain layer", "application layer", and user interface. Most descriptions leave out the application layer, but it makes sense when one considers it the code used to drive the UI. That is, layers like the .NET code-behind or the Java code used to drive JSP pages.
When the application layer is too thick—that is, when it contains logic that should belong in the domain layer—it could become what Evans calls a "smart UI". This common anti-pattern is the UI analog of the transaction scripts mentioned in SAIP. I found it interesting that Evans emphasized that the Smart UI may be a valid design strategy for simple, one-off systems that do not need to scale. Evans seems to consider that a smart UI is the exclusive opposite of domain model-driven application. However, many real-world applications lie between the two extremes. It is possible—but obviously not ideal—to have a layered architecture in which a large chunk of domain logic has leaked into the application layer and UI. In this case, much of the development effort should focus on moving domain logic "down" into the appropriate domain objects.
The remainder of the section lists patterns used to organize the domain model. To me it read like a taxonomy of the types of objects that appear in a business system. I will not re-list all the patterns here, but there were some similarities to that I came across several months ago.
Architectural Mismatch
My software engineering class finished last week. In this post, I would like to explore some of the ideas presented in the second-to-last chapter entitled "Building Systems from Off-the-Shelf Components".
The chapter leaves the word "component" very loosely defined. In my mind, a component is a self-contained unit of reuse that provides functionality for a specific task. Contrast this to a general-purpose framework such as the Java Platform or the .NET Class Library that provides a great deal of fine-grain functionality in many areas. One would not include either of these as a "component" of a larger system, but they can be used to create components such as a PDF document creator, a custom network interface, or a GUI control, to name a few examples.
The chapter provides some very idealistic (read: unrealistic) methods for determining whether a component satisfies an application's requirements. The short answer to this question is "probably not". The book says, "components that were not developed internally for your system may not meet all of your requirements." (Empasis theirs) This echoes some of what I wrote in my previous post regarding why product line component reuse works well. I agree with Professor Johnson that the only way to know for sure if a component will work in a system is to actually use it in the system.
The chapter also very briefly discusses some strategies for combating this "architectural mismatch". The strategies the book lists are confusing and not particularly useful, so I will generalize the problem of architectural mismatch to the following list:
How to Write Good Software with Bad Components
Code for Replacement
As Professor Johnson said in class, "If you use a component… be prepared to get rid of it." The most common way to achieve this goal is to encapsulate a component in a custom interface. The component can be swapped out, but a well-designed interface can remain constant. calls this the "Adapter" pattern; SAIP calls it a "Wrapper".
Convert input and output
The data a component expects is almost always different from what a target architecture uses. In this case it is common to write converters as standalone modules or in conjunction with an adapter.
Use Multiple Components
If a particular component is missing functionality, it may be possible to find another component that provides it. In this case, one may want to use the Façade pattern to provide a simple interface to the set of components.
Expect the Component to Break
It is usually very difficult to determine the failure points of an OTS component. This is especially true for closed-source components or those with spotty documentation. It is also very difficult to rigorously test a component without having written it from the ground up. For these reasons, it is often a good idea to expect a component to break and provide a general mechanism with which to recover from a component failure.
Allow Omission
It may be possible to omit a component in certain circumstances. In this case, it makes sense to provide a means of removing components through some type of plug-in architecture, configuration setting, or conditional build. By removing a component completely, it can no longer be a possible point of failure.
Rewrite the component
Finally, if all other strategies fail, one may be left with no choice but to rewrite all or part of a component. This is often a valid choice since the true cost of using a component is always much much higher than its price tag. A component always has high learning cost, a higher chance of failure, and high integration cost. This is why companies are often willing to expend many man-hours writing a component that can be bought for a few hundred dollars.
I wrote this list off the top of my head, so please feel free to add your thoughts in the comments. I hope to get a chance to read some of the books that Professor Johnson mentioned dealing specifically with architectural mismatch. Component-based development is a common and increasingly important aspect of software development. Since bad components are also unfortunately common, I think tactics for combating architectural mismatch are a vital part of a programmer’s toolbox.
Software Product Lines
Today in my software engineering class, Professor Johnson said something like, "the TA and I noticed that some of you are slowing down on your weblog posts." I have a backlog of stuff that I would like to write about, so I will take his statement as a hint to start posting some of it ASAP.
For this post, I would like to focus on 's chapter 14 entitled "Software Product Lines". I found this chapter especially interesting because I have often told people that if and when I join industry, I envision myself in a chief architect role for a large system or suite of systems. Alternately, if I decide to stay in academia, I see myself researching or consulting on the design of such systems.
I find product lines interesting because their architecture has many of the same concerns as single system architecture, but with larger components and more emphasis on modularity, modifiability, and reuse. For example, if one designs a component for one product, it is usually beneficial to generalize the component for use across the entire product line. This requires very careful, well-planned development which would likely yield a better product overall.
Reuse is a valid design concern for any system, but it works especially well in a product line for several reasons. First, almost everything can be reused. The book mentions reusing requirements, software elements, analysis, testing, people, and many other architectural elements. Second, all of these architectural elements fall under the assumptions and constraints of entire set of systems. The book echoes this idea nicely when it says, “Software product lines make re-use work by establishing a very strict context for it.” Contrast this to simple code reuse in which one uses a library or framework built elsewhere. The library may have made assumptions about its use that contradict the needs of the user or it may be too general or specific for the desired task.
I also find it interesting that a company’s success can be driven by the quality of the software used to drive its products. In this case, it benefits a company greatly to have a common software framework. When many of a company’s products rely on the same software, it changes the emphasis from quick, get-it-out-the-door coding to careful maintenance and incremental expansion of the common software product line.
Familiar Concepts
I like reading about software because often a writer will attach a descriptive name to a familiar concept or describe it in a new and interesting way. This week’s software engineering lectures and readings in brought up several such names and descriptions that stuck with me.
Architecture as Early Decisions
The early chapters in the book echoed the class discussion about the definition(s) of software architecture. Its main definition says the following:
The software architecture of a program or computing system is the structure of structures of the system, which comprise software elements, the externally visible properties of those elements, and the relationships between them.
This is as good a definition as any, but it did not ring as true in my mind as a passage that appeared several pages later in the book:
Software architecture represents a system’s earliest set of design decisions. These early decisions are the most difficult to get correct and the hardest to change later in the development process, and they have the most far-reaching effects.
The second definition emphasizes the thought process behind an architecture; the first describes any system whether or not any thought was put into building it. Granted, a bad architecture is still an architecture, but I prefer to focus on the good ones—those that made the right decisions early on.
Professor Johnson paraphrased this idea in class today when he said, “Architecture is the stuff you wished you did right in the beginning.”
Transaction Scripts
I am sure all programmers have encountered a particular style of programming that seems to willfully ignore object-oriented programming practices. Rather than encapsulate behavior in classes and split an application into well-defined modules or layers, a programmer will instead make many monolithic functions that probably contain SQL directly in the code. This style may work well on small applications—I am certainly guilty of using it—but it breaks down as common functionality is duplicated and bugs appear.
Tuesday’s lecture described these types of programs as “transaction scripts”. This term is much more descriptive than “procedural-style programs” and easier to say than “non-object-oriented programs”.
Software Business Cycle
A common theme in the early lectures and the first chapter of the book involved the interaction between the architecture of a system and the organization that develops the system. For example, an organization that implements a client-server architecture will probably have two teams: a client development team and a server development team. That is, the module divisions in an architecture will usually define the divisions in an organization.
Obviously, the influence can travel in the opposite direction as well. The people and expertise in an organization will affect the final architecture. For example, a group of PHP/MySQL experts will probably not design an architecture around .NET and Microsoft SQL (though a good architecture may not even specify the implementation language or database).
The book calls this interaction the “Software Business Cycle”. It is a good thing to keep in mind for when I eventually enter industry, and something that I have been insulated from in my previous programming jobs.
Unrelated
I found it interesting that with a quick search on I was able to find the that the book publishers chose for the . It seems like an odd, random choice. Why is a person standing alone in the entryway at night? Why did the publishers crop out the fountain, leaving what appears to be a strange discoloration in the sky?
Building Architecture vs. Software Architecture
Yesterday's software engineering class asked the question, "What is a software architect?" The class offered all sorts of answers, most involving some type of technical leadership. To me, the architect sets up the framework in which the other developers work. He or she makes the module- and application-spanning decisions that guide— and in some cases limit— the decisions of other developers.
The architect's main goal when making these decisions is to ensure the overall quality of the final product. However "quality" can be defined using any number of criteria: maintainability, security, speed, flexibility, reliability, etc. Professor Johnson refers to these as the "-ity" words. All involve tradeoffs and may indeed be mutually exclusive. The architect must decide which of the particular "-ities" to focus on and control the tradeoffs based on the client, business, or product needs.
We also discussed some of the many differences between architects of buildings and architects of software. The point that stuck in my mind was this: many years ago, a building architect used to be the designer, engineer, and on-site technical lead of a project. (who rebuilt the churches of London after the fire of 1666) and (who directed the construction of the Brooklyn Bridge from his sickbed via his wife) both seem to meet this criteria. This description sounds very similar to how we describe a software architect today.
As time went by, the roles seemed to diverge. An "architect" became responsible for the aesthetic design of a building, the "engineer" became responsible for making the design work, and the on-site technical lead became any number of contractors and subcontractors. Of course there will always be a great deal of overlap and interaction between these roles, but the differences certainly exist.
Software architecture is still such a young field that I cannot help speculating that it will one day undergo a similar divergence. Professional software architects love trying to make programming “more like engineering”, but I think many changes need to occur before that can happen. To illustrate, a building architectural firm can easily send a design to an architectural engineering firm to prepare technical blueprints, and the engineering firm can easily send the blueprints to any number of contractors. (I realize this is a gross simplification; bear with me.) In software architecture, however, it is very difficult to create a transferable, adaptable “blueprint” of a software system for others to implement. Formal specifications, UML, design/architectural patterns, and other tools try to meet this need, but they still have a long way to go.
Despite this speculation, I believe that a software architect will always remain closer to code (or some other problem-solving abstraction) than a building architect is to a hammer. Certainly part of this belief is personal in that I really like code. However the larger part is based on the observation that software architecture is fundamentally different from any other constructive profession:
- The final product can be duplicated infinitely. This means that there is no need to “mail the blueprint to all the contractors”. One can just email the executable.
- Code is the current “top” of many layers of abstractions. Software architecture can grow (and is growing) to accommodate new, more powerful layers of abstractions such as architectural patters or large, interconnected modules.
- The tools are constantly changing. Software architecture will use different tools, but these tools will still require a technical lead for the same reasons projects currently need a technical lead when writing code.
For these reasons, I believe that software architecture will evolve not toward greater speciality—like what happened in building architecture— but toward greater generality to encompass new abstractions and tools.
An NUnit Test Suite Implementation
During my last two programming jobs and my senior year software engineering project, I became the evangelist on my development teams. I designed or helped design three unit test suite architectures and wrote many of the actual unit tests as I developed production code. The following article outlines the implementation of a hypothetical test suite with some of the best features of the test suite architectures that I used.
Update: I submitted this article to . It is a bit easier to read over there since the site has automatic code coloring.
Introduction
In this article I will describe a scalable suite for use on a , database-driven .NET application. The suite will define sample generators used to easily create dummy data for tests, and it will use test fixture inheritance for increased scalability and to allow for easy testing of common functionality.
I will focus on testing of the of a sample application. This is usually located in the "middle" of an application and is often called the business logic layer (BLL). It uses the data access layer (DAL) to mediate data transfer to and from the database and drives the behavior of one or more user interfaces (UI) with which the user interacts or in which data is displayed.
This article assumes knowledge of .NET and C#, but does not require experience with unit testing or NUnit in particular.
BLL Implementation
In this sample application, classes in the BLL that implement database operations inherit from a base class called PersistentObject. This class defines the following interface[1]:
public abstract class PersistentObject {
protected long _uid = long.MinValue;
/// <summary>
/// The unique identifier of this object
/// in the database
/// </summary>
/// <remarks>
/// Set when Fill() is called
/// </remarks>
public long UID {
get { return _uid; }
}
/// <summary>
/// Save this object's data to the database.
/// </summary>
public abstract void Save();
/// <summary>
/// Fill this object with data fetched from the
/// database for the given UID
/// </summary>
/// <param name="uid">The unique identifier of
/// the record to fetch from the database</param>
public abstract void Fill(long uid);
/// <summary>
/// Remove this object from the database
/// </summary>
public abstract void Delete();
/// <summary>
/// Fetches an object of the given type and with the
/// given UID from the database
/// </summary>
/// <typeparam name="ConcreteType">
/// The type of object to fetch
/// </typeparam>
/// <param name="uid">
/// The unique identifier of the object in the database
/// </param>
public static ConcreteType Fetch<ConcreteType>(long uid)
where ConcreteType : PersistentObject, new() {
ConcreteType toReturn = new ConcreteType();
toReturn.Fill(uid);
return toReturn;
}
}
Say, for example, the application must save some client data and a client address that can be used elsewhere in the application. The BLL would therefore need to contain Address and Client classes derived from PersistentObject.
public class Address : PersistentObject {
private string _streetAddress = null;
private string _city = null;
private string _state = null;
private string _zip = null;
public string StreetAddress {
get { return _streetAddress; }
set { _streetAddress = value; }
}
public string City {
get { return _city; }
set { _city = value; }
}
public string State {
get { return _state; }
set { _state = value; }
}
public string Zip {
get { return _zip; }
set { _zip = value; }
}
public override void Save() {
// Call DAL to save fields
// ...
}
public override void Fill(long uid) {
// Call DAL to fill fields
// ...
}
public override void Delete() {
// Call DAL to delete object
// ...
}
/// <summary>
/// Utility function that returns the Address with
/// the given UID
/// </summary>
public static Address Fetch(long addressUID) {
return PersistentObject.Fetch<Address>(addressUID);
}
}
Client is similar, except it contains a property that returns the Client's Address object.
public class Client : PersistentObject {
private string _firstName = null;
private string _lastName = null;
private string _middleName = null;
private long _addressUID = long.MinValue;
private Address _addressObject;
// ...
public long AddressUID {
get { return _addressUID; }
set { _addressUID = value; }
}
/// <summary>
/// On-demand property that returns this Client's
/// Address based on the current value of AddressUID
/// </summary>
public Address Address {
get {
if (AddressUID == long.MinValue) {
_addressObject = null;
}
else if (_addressObject == null
|| AddressUID != _addressObject.UID) {
_addressObject = new Address();
_addressObject.Fill(AddressUID);
}
return _addressObject;
}
}
// ...
}
To save new client data, the user would do something like the following:
// Create the address that the client will link to Address newAddress = new Address(); newAddress.StreetAddress = StreetAddressInput.Text; newAddress.City = CityInput.Text; newAddress.State = StateInput.Text; newAddress.Zip = ZipInput.Text; // Save the address to the database newAddress.Save(); // Create the client Client newClient = new Client(); newClient.FirstName = FirstNameInput.Text; newClient.MiddleName = MiddleNameInput.Text; newClient.LastName = LastNameInput.Text; // Link to the address newClient.AddressUID = newAddress.UID; // Save the client to the database newClient.Save();
And to retrieve client data elsewhere in the application, the user would do something like the following:
Client existingClient = Client.Fetch(clientUID); Address clientAddress = existingClient.Address;
Unit Testing Background
The BLL implementation outlined above is relatively standard. One can verify its behavior in any number of ways. The simplest but least robust is to test the UI. Since the UI depends on the BLL, one could conceivably verify the application by running through web pages or dialog boxes by hand. But what if the application has multiple UIs? Obviously, this method is slow, difficult to repeat, prone to human error, and may miss bugs. Also, it may promote bad programming practice in that a naïve coder may fix a symptom in the UI rather than the base cause in the BLL. This is not to say that we should omit UI testing, just that we should not rely on it to verify business logic.
A better option would be to create a simple driver program that calls the BLL method under development. This option would certainly be easier to repeat, but it may be difficult to save drivers for later or run all existing drivers to verify that nothing is broken.
This is where come in. One can think of a unit test as a simple driver program that one would probably write anyway. The unit testing framework organizes the tests, provides tools to make writing tests easier, and allows one to run tests in aggregate.
Test Suite Implementation
Since this article discusses a .NET application, I will use the in the example test suite. NUnit provides several features such as a , , and that make writing and running tests very easy.
It is most intuitive to create a test fixture (that is, a class containing a series of tests) for each class in the BLL. So, in keeping with the example, we will have ClientTest and AddressTest classes in the example test suite. These basic test fixtures will need to verify that data is added to the database, retrieved, edited, and deleted correctly. We often need to create dummy objects, so these test fixtures will also include some sample generators.
Finally, we do not want to have to repeat common test code across many different test fixtures, so we will test the common database operations in a PersistentObjectTest class from which ClientTest and AddressTest both inherit.
I will explain the construction of PersistentObjectTest in parts. First, the class declaration:
/// <summary>
/// Abstract base class for test fixtures that test
/// classes derived from BLL.PersistentObject
/// </summary>
/// <typeparam name="PersistentObjectType">
/// The type of BLL.PersistentObject that the derived
/// class tests
/// </typeparam>
public abstract class PersistentObjectTest<PersistentObjectType>
where PersistentObjectType : PersistentObject, new() {
This shows that PersistentObjectTest is a generic type that accepts the type of the object that its derived class tests. This type derives from PersistentObject and has an empty constructor. This lets us create sample generators and other utilities in a type-safe, generic manner:
#region Sample Generators
/// <summary>
/// Returns a dummy object
/// </summary>
/// <param name="type">
/// Indicates whether the returned dummy object should
/// be saved to the database or not
/// </param>
public PersistentObjectType GetSample(SampleSaveStatus saveStatus) {
PersistentObjectType toReturn = new PersistentObjectType();
FillSample(toReturn);
if (saveStatus == SampleSaveStatus.SAVED_SAMPLE) {
toReturn.Save();
// Check Save() postconditions...
}
return toReturn;
}
/// <summary>
/// Fills the given object with random data
/// </summary>
/// <param name="sample">
/// The sample object whose fields to fill
/// </param>
/// <remarks>
/// Should be overridden and extended in
/// derived classes
/// </remarks>
public virtual void FillSample(PersistentObjectType sample) {
// nothing to fill in the base class
}
/// <summary>
/// Asserts that all fields in the given objects match
/// </summary>
/// <param name="expected">
/// The object whose data to check against
/// </param>
/// <param name="actual">
/// The object whose fields to test
/// </param>
/// <remarks>
/// Should be overridden and extended in
/// derived classes
/// </remarks>
public virtual void AssertIdentical
(PersistentObjectType expected, PersistentObjectType actual) {
Assert.AreEqual(expected.UID, actual.UID,
"UID does not match");
}
#endregion
GetSample() simply returns a dummy object. The implementations of FillSample() and AssertIdentical() are delegated to the derived classes. These three methods are used by other test fixtures to create and test sample objects. The base class uses them to verify the basic database operations in the following test methods:
#region Data Tests
/// <summary>
/// Tests that data is sent to and retrieved from
/// the database correctly
/// </summary>
[Test()]
public virtual void SaveAndFetch() {
PersistentObjectType original =
GetSample(SampleSaveStatus.SAVED_SAMPLE);
PersistentObjectType fetched =
PersistentObject.Fetch<PersistentObjectType>(original.UID);
// verify that the objects are identical
AssertIdentical(original, fetched);
}
/// <summary>
/// Tests that editing an existing object works correctly
/// </summary>
[Test()]
public virtual void EditAndFetch() {
PersistentObjectType modified =
GetSample(SampleSaveStatus.SAVED_SAMPLE);
// edit fields
FillSample(modified);
// save edits
modified.Save();
// make sure edits were reflected in the database
PersistentObjectType fetched =
PersistentObject.Fetch<PersistentObjectType>(modified.UID);
AssertIdentical(modified, fetched);
}
/// <summary>
/// Tests that deletion works correctly.
/// </summary>
/// <remarks>
/// Expects data retrieval to fail
/// </remarks>
[Test(),
ExpectedException(typeof(DataNotFoundException))]
public virtual void Delete() {
PersistentObjectType toDelete =
GetSample(SampleSaveStatus.SAVED_SAMPLE);
long originalUID = toDelete.UID;
toDelete.Delete();
// expect failure because the object does not exist
PersistentObject.Fetch<PersistentObjectType>(originalUID);
}
#endregion
With PersistentObjectTest doing the heavy lifting, the concrete test classes need only define how to fill a sample object and how to check if two sample objects are identical. They can also define additional sample generators, utility functions, and test methods as needed.
[TestFixture()]
public class AddressTest : PersistentObjectTest<Address> {
public override void FillSample(Address sample) {
base.FillSample(sample);
Random r = new Random();
string[] states = {"IL", "IN", "KY", "MI"};
sample.City = "CITY" + DateTime.Now.Ticks.ToString();
sample.State = states[r.Next(0, states.Length)];
sample.StreetAddress = r.Next().ToString() + " Anywhere Street";
sample.Zip = r.Next(0, 100000).ToString("00000");
}
public override void AssertIdentical(Address expected, Address actual) {
base.AssertIdentical(expected, actual);
Assert.AreEqual(expected.City, actual.City,
"City does not match");
Assert.AreEqual(expected.State, actual.State,
"State does not match");
Assert.AreEqual(expected.StreetAddress, actual.StreetAddress,
"StreetAddress does not match");
Assert.AreEqual(expected.Zip, actual.Zip,
"Zip does not match");
}
}
[TestFixture()]
public class ClientTest : PersistentObjectTest<Client> {
public override void FillSample(Client sample) {
base.FillSample(sample);
sample.FirstName = "FIRST" + DateTime.Now.Ticks.ToString();
sample.MiddleName = "MIDDLE" + DateTime.Now.Ticks.ToString();
sample.LastName = "LAST" + DateTime.Now.Ticks.ToString();
sample.AddressUID = new AddressTest().GetSample
(SampleSaveStatus.SAVED_SAMPLE).UID;
}
public override void AssertIdentical(Client expected, Client actual) {
base.AssertIdentical(expected, actual);
Assert.AreEqual(expected.FirstName, actual.FirstName,
"FirstName does not match");
Assert.AreEqual(expected.MiddleName, actual.MiddleName,
"MiddleName does not match");
Assert.AreEqual(expected.LastName, actual.LastName,
"LastName does not match");
Assert.AreEqual(expected.AddressUID, actual.AddressUID,
"AddressUID does not match");
}
}
ClientTest’s sample generator uses AddressTest.GetSample() to create a dummy Address when filling a dummy sample Client. This general pattern is used often in this type of test suite. Any test that needs a dummy object simply calls the appropriate sample generator.
When running tests, NUnit looks for any classes marked with the attribute [TestFixture()]. It creates an instance of the class and runs any methods marked with the attribute [Test()]. The [ExpectedException()] attribute tells NUnit that the given method should throw the given exception. The test code itself uses NUnit’s Assert object to verify that expected properties hold.
Any test fixture that inherits from an abstract base class also “inherits”[2] any test methods. Therefore, AddressTest, a concrete test fixture, inherits the SaveAndFetch(), EditAndFetch(), and Delete() test methods from PersistentObjectTest. Note that a derived class can override these test methods if, for example, its corresponding BLL class does not support deleting:
[Test()]
public override void Delete() {
Assert.Ignore("This object does not support deleting");
}
Inheritance
Now that we have the basic test suite implemented, say the requirements change and we need to add a class representing a preferred client that receives discounts and special credit. First we will create a PreferredClient class derived from Client:
public class PreferredClient : Client {
private double _discountRate = 1;
private decimal _accountCredit = 0.00M;
//...
public override void Save() {
base.Save();
// call DAL to save this object's fields
}
//...
}
Next, we must create a PreferredClientTest test fixture derived from ClientTest. But this causes a problem: ClientTest inherits from PersistentObjectTest<Client>, but we need PreferredClientTest to inherit indirectly from PersistentObjectTest<PreferredClient> so that PersistentObjectTest’s methods use the correct type of object. The solution is to move the generic signature “down the hierarchy” to ClientTest:
/// <summary>
/// Generic tester for classes derived from Client
/// </summary>
public class ClientTest<DerivedClientType>
: PersistentObjectTest<DerivedClientType>
where DerivedClientType : Client, new() {
public override void FillSample(DerivedClientType sample) {
base.FillSample(sample);
sample.FirstName = "FIRST" + DateTime.Now.Ticks.ToString();
sample.MiddleName = "MIDDLE" + DateTime.Now.Ticks.ToString();
sample.LastName = "LAST" + DateTime.Now.Ticks.ToString();
sample.AddressUID = new AddressTest().GetSample
(SampleSaveStatus.SAVED_SAMPLE).UID;
}
public override void AssertIdentical
(DerivedClientType expected, DerivedClientType actual) {
base.AssertIdentical(expected, actual);
Assert.AreEqual(expected.FirstName, actual.FirstName,
"FirstName does not match");
Assert.AreEqual(expected.MiddleName, actual.MiddleName,
"MiddleName does not match");
Assert.AreEqual(expected.LastName, actual.LastName,
"LastName does not match");
Assert.AreEqual(expected.AddressUID, actual.AddressUID,
"AddressUID does not match");
}
}
But we need to keep the non-generic tester so Client's tests will still run:
/// <summary>
/// Non-generic tester for base Client type
/// </summary>
[TestFixture()]
public class ClientTest : ClientTest<Client> {
// add Client-specific tests as needed
}
Finally, we define PreferredClientTest in terms of the generic version of ClientTest:
[TestFixture()]
public class PreferredClientTest : ClientTest<PreferredClient> {
public override void FillSample(PreferredClient sample) {
base.FillSample(sample);
Random r = new Random();
// some random dollars and cents
sample.AccountCredit =1 + .25M;
sample.DiscountRate = r.NextDouble();
}
public override void AssertIdentical
(PreferredClient expected, PreferredClient actual) {
base.AssertIdentical(expected, actual);
Assert.AreEqual(expected.AccountCredit, actual.AccountCredit,
"AccountCredit does not match");
Assert.AreEqual(expected.DiscountRate, actual.DiscountRate,
"DiscountRate does not match");
}
}
Note that the FillSample() and AssertIdentical() methods simply extend their base class counterparts. One can easily see how this type of expansion can continue as the application grows; it is simply a matter of adding a subclass and implementing the appropriate methods.
Drawbacks
Primary Keys
This hypothetical test suite makes one glaring assumption: it assumes that PersistentObject is a valid base class for real-world classes. This assumption becomes most apparent in the Fetch/Fill methods which always take a long as a unique database identifier. Often, a real-world database will not be normalized such that all data has a bigint primary key (if only!). One can get around this problem by expanding the generic signature of PersistentObjectTest and PersistentObject.Fetch() to include the type of the derived class’ unique identifier.
Dummy Data Overload
Because of its dependence on sample generators, the form of test suite creates a large amount of dummy data in the database. This is acceptable since a large part of testing a database-driven application is verifying that data is saved and retrieved correctly. However, it means that the development application must have a dedicated testing database server that is regularly reset to some known state to prevent dummy data from overshadowing valid data. Also, the recursive nature of the sample generators may make it possible to get into a never-ending sample generation cycle that could very quickly bring a database (not to mention the stack frame) to its knees.
Randomness
The implementation I have outlined assumes that random dummy data will often suffice for most tests that use the generated objects. In other words, the consumer of the sample object must ensure that a generated object meets the desired preconditions. Bounds on randomness can often be achieved with parameterized sample generators such as the following:
/// <summary>
/// Return a client with one of the given first names
/// </summary>
/// <param name="firstNames">
/// The list of possible first names
/// </param>
public static Client GetBoundedSample
(string[] firstNames, SampleSaveStatus saveStatus) {
Client toReturn = new ClientTest().GetSample(SampleSaveStatus.UNSAVED_SAMPLE);
Random r = new Random();
toReturn.FirstName = firstNames[r.Next(0, firstNames.Length)];
if (saveStatus == SampleSaveStatus.SAVED_SAMPLE) {
toReturn.Save();
}
return toReturn;
}
However, there is no general, easily-implemented way for the sample generators to control randomness or return a bounded exhaustive list of all possible samples. In fact, exhaustive test generation is an .
Conclusion
The hypothetical test suite architecture that I have outlined is useful for testing tiered, database-driven applications in which reasonable, random sample data is often needed. By using test fixture inheritance and sample generators, it becomes very easy to expand the test suite as the application grows. It also reduces the amount of code needed to test the most important aspect of a database-driven application: that data travels to and from the database correctly. Variations on this testing implementation have performed well for several .NET applications with from several dozen to several thousand classes.
Footnotes
- In reality, Save, Fill, and Delete would usually wrap protected overridable methods like DoSave, DoFill, and DoDelete. This would allow the base class to define common pre- and post-database operation steps while leaving the derived class to handle its own data. Also, Delete would usually set an “Ignore” flag rather than completely remove the data from the database. Regardless, we can ignore those complications in this article. Just assume that a derived class would override Save, Fill, and Delete in the obvious manner if the class supports the appropriate database operation.
- This is not true inheritance. NUnit uses to find any methods marked with the attribute
[Test()]regardless of where the method occurs in the class hierarchy. Also, overriding a test method does not retain the[Test()]attribute.
Files
- nunittestsuite.zip – The Visual Studio 2005 solution with the sample code. It will compile and run, but the tests will not pass since the DAL is left unspecified.
Notes
- Decimal)r.Next( ⬏
Software Engineering Course
I am taking , UIUC's graduate-level software engineering course, this semester. It is taught by who cowrote the classic .
One interesting feature of the course is that we must keep a journal of the presentations we attend and the books we read for the course. Professor Johnson recommends keeping the journal as a weblog. He mentioned eventually setting up an aggregator, but right now he is just keeping a of the students' weblogs. I will be keeping my journal under this weblog's CS527 category (RSS, Atom). You can probably expect more software architecture-oriented posts in the near future.
Weblogs, a , an , and a well-known expert for a teacher... I expect good things from this class.
Scientific Visualization = Pretty Pictures
This semester I took . It was my first graphics course as well as my first graduate-level course. It covered various methods of data processing and visualization of scientific data such as medical scans, physical simulations, or mathematical equations. The best part about the course is that we got to make pretty pictures.
For the first project, we implemented three forms of vector visualization. The vector field we used was the electric field created by a positively charged object at the center of the space surrounded by four negatively charged objects. First, we rendered the representing the paths of a particles along the field:
Second, we rendered the time surface, which is the expanding (or contracting) surface created by a set of points at a particular point in time:
Third, we rendered the , which is the "tube" with borders defined by the streamlines. This type of visualization shows the divergence or convergence characteristics of a vector field.
For the second project, we impelemented a surface rendering algorithm called marching tetrahedrons, which is a particular form of . We rendered two datasets:
First, the electron density of a methane molecule:
Second, a prepared dataset of someone's head:
We used these same two datasets for the third project, but we rendered them instead using volume rendering with . The basic idea is that one shoots a ray out of every pixel of the image and accumulates color for each sample point along a ray that passes through the data.
Note that one can see both the brain and the skull in the following renders of the head data.
For the term project, students had to prepare a project proposal, implement the proposed visualization, and then write a 12-page paper on the project.
All in just two weeks.
We were "encouraged" to work in groups for this project, so and I decided to work together. Our project proposal had three parts:
- Get some real medical data to visualize. In the previous three projects we had only used calculated data (as in the case of the electrical field and methane molecule) or prepared datasets (as in the case of the head).
- Render more than one surface. The surface rendering project only rendered one surface.
- Use the view orientation, color, and other parameters from the surface rendering to create a volume rendering. In all the projects, we had to type in parameters by hand or, in the case of my volume rendering project, create an initialization file with the paramters. We wanted to make the process easier.
We initially planned to use the CT or MRI scans of my leg implant. I called up my surgeon, but we found out that my scans had taken place before the hospital went completely digital. My surgeon's assistant— who went above and beyond searching for data for us to use— learned that a man with a soft-tissue tumor was getting an MRI one day last week. She got the technicians to burn a CD of the data, and I traveled to the office the next day to pick it up.
The data was ideal for the project. It was in a standard binary format called , which stands for Digital Imaging and Communications in Medicine. I read up on the standard, and wrote a reader in about two days. Here are some of the raw MRI slices direct from the machine:
While I was working on the DICOM reader, Marc was working on the user interface. Once those two parts were more or less functional, we started working on the rendering portions of the application. He took care of the surface rendering, greatly extending the project two code such that it could respond better to user input and render up to three translucent surfaces. Rendering three surfaces was more involved than it might appear because the triangles that form the surface must be drawn from front to back for lighting and transparency to work. Also, we needed to calculate the camera position and orientation for the volume rendering to work.
I built the volume rendering section such that it could take the view parameters from the surface section, make some good guesses about the volume rendering parameters, and output a ray cast image. Unlike project three in which rays traveled in parallel, we had to cast diverging rays to make the perspective appear the same as in the surface rendering.
We managed to iron out the last glaring bugs during the final stress-filled day before the due date. We traded off on writing the paper and programming up into the final minutes. Amazingly, we turned in the final CD and paper printout ten minutes before the midnight due date.
Despite having a ridiculously short time to implement the project, I think the final pictures speak for themselves:
Here we used the methane data to test the multiple surfaces. The first image is the GUI with the surface rendering; the second is the volume rendering.
Here is a rendering of one of the MRI data sets. It is looking at the back of the patient's thigh.
I feel that the following images made the project. The first picture is the MRI slice, the second shows the surface rendering, and the final picture shows the volume rendering. The tumor is obvious in the MRI slice and surface rendering. It is a little less visible in the volume rendering, but it is still there.
I have posted the application executable here if anyone would like to try it out. I have disabled the head and MRI data because the data files are so large. You can play with the surfaces of the methane data by adjusting the sliders on the right. Click and drag on the surface render window to rotate the molecule. Right-click and drag to zoom in or out. You can create a volume rendering by clicking the "Render" button.
A word of warning: I am ashamed to admit that the volume rendering leaks memory like crazy, so you may not want to run it more than two or three times before restarting the application. It will very likely crash if you render much more than that. Also, the volume renderings can take anywhere from one to five minutes, depending on your CPU speed and how much memory you have.
Semester Projects Part Three: The Ant Project
For some context, see parts one and two of my semester project rundown.
In the rare periods during which I was not thinking about the software engineering project, I worked on the semester project for my programming languages course. It was the most unique programming assignment that I think I have ever worked on. I got to play around with , environment simulation, programming language design, and . It was a very interesting and enjoyable project, but I would have enjoyed it more had the software engineering project not consumed all of my time.
The project idea came from the . We had to "design an ant colony that will bring the most food particles back to its anthill, while fending off ants of another species." Ants were finite state machines governed by a simple machine language consisting of eight simple commands. To create a winning ant, we had to write a compiler that would take in a high-level ant language that we designed and output a series of ant commands. The twist was that for the class project we had to write our compiler in .
I had some experience with functional programming from using , but I had never used a "real" functional programming language. The majority of the course involved the , so the instructor wanted to give us applied experience with both the Lambda Calculus and functional programming by having us write a language compiler in Scheme.
The class had only six students, which made for very intimate lectures. Imagine six people sitting in the first three rows of a 90-person classroom. There were two women in the class. This is an astronomically high percentage for a CS course. The class split into two ant teams. At the end of the semester we would pit our ants against each other and see who won. About halfway through the semester, one of the women dropped the course. This cut our female attendance in half and reduced my team to myself and a guy named Phil who was on one of the software engineering mapping teams. I am glad he was not on the other Del.icio.us team. Having two software engineering students on a team was difficult enough; it would have been worse had we been competing against each other in the other class.
The first draft of our language looked like a normal procedural language containing loops, conditionals, variables, etc. I found out later that this was the direction that the other team as well as the went. Phil and I, however, felt there had to be a better way to specify ant behavior.
Our second draft abandoned the procedural approach. We instead designed a language that could specify stimulus/response pairs similarly to the way we thought real ants could possibly think. Each stimulus "handler" specified a series of actions to take in response to the ant sensing something such as a wall, food, or other ant. The syntax expanded on the state-based nature of the ant neurology by allowing the programmer to specify superstates that could contain stimulus/response handlers as well as any number of substates. Handlers could jump between states based on certain stimuli. For example, sensing food could jump from the "searching" state to the "found_food" state. A state would loop indefinitely, performing its default actions, until some stimuli caused the ant to jump to a different state.
The following is an example of our final language. It exhibits several features of the language such as macro expansion, marker expansion, default actions, substates, and various stimuli. The syntax looks very Scheme-like, but this was simply to make it easier to read and parse.
(default
(random forager soldier))
(macro SEE-FOOD
(move)
(pickup))
(marker TRAIL (1))
(state forager
(state searching-for-food
(sense ahead rock
(turn left))
(sense ahead food
(move)
(pickup)
(jump return-home))
(default
(move)
(mark TRAIL)))
(state return-home
; do stuff...
)
(default
(jump searching-for-food)))
(state soldier
; do stuff...
)
The initial default state is executed first. This randomly splits the ants into two main types: soldiers and foragers. Macro and marker definitions, such as the ones that follow the initial default state, are automatically expanded by the compiler when used in the code. The hierarchy of the states also defines the scope of macro and marker definitions. A macro or marker is valid in the current state as well as any substates. Therefore, because the two definitions are in the "root" state, they are valid for the entire program. The forager state contains the searching-for-food and return-home substates. A forager ant will initially fall into the searching-for-food state. The ant will move and mark until it senses either a rock or food. If it senses a rock, it will turn left and resume searching. If it senses food, it will execute the code in SEE-FOOD, then jump to the return-home state. The marker in this case does not do anything. A more complex program like the one we used in the competition could use markers to create trails and send messages to other ants.
This is a very stupid ant and far less complex than the ant we ran against the other team at the end of the semester. However, one can easily see how the state hierarchy and stimulus/response design can be used to define very complex and useful behavior.
Phil and I had our language design pretty early in the semester, but as software engineering picked up, our compiler and ant fell further and further down in the priority list. I am sorry to say we finished the last 75% of the project in the three days before the final presentation and competition. We put the finishing touches on our ant program during the hour before the class started.
Our final ant had forager and guard states. We made it so that there were three foragers for every guard. Foragers wandered around looking for food. Whenever one found food, it would carry its food along a directed path back to the home anthill. Other foragers, when they found a food trail would follow it in the opposite direction to the food. Guards stayed on the home anthill and guarded the food that the foragers brought back.
The competition took place on a that a classmate found. We lost, but I do not feel this is due to any weakness in the language. Given time, we could have created an incredibly powerful ant.