www.BrettDaniel.com

Testing a Testing Tool Part Three: ReAsserting ReAssert

This is the third in a series of posts in which I discuss the challenges I encountered when testing ReAssert. I already showed how I used tests as their own input and automatically deployed ReAssert for my own use. Here, I combine both aspects by demonstrating how ReAssert can repair its own unit tests.

ReAsserting ReAssert

Tests break when the system under test evolves in ways that invalidate the assumptions encoded in the tests. ReAssert addresses this problem by making it easier to update tests to reflect the changed behavior. Like any other complex piece of software, ReAssert itself has evolved, making it susceptible to the same problem that it attempts to solve. There have been several times in which a change to ReAssert broke its unit tests. It is natural to ask whether ReAssert could repair them.

Recall from the first post in this series that ReAssert's unit tests have two parts: a failing test method marked with the @Test annotation and its expected repair marked with the @Fix annotation. When such a test breaks, it means that the @Fix method must change to reflect ReAssert's actual output.

Here is a real example of one time that ReAssert's evolution caused tests to break. An early version of ReAssert lacked the ability to trace an expected value back to its declaration. Instead, ReAssert would simply overwrite the expected side of a failing assertion. The following code (similar to the example used in the first post) shows the @Test and @Fix methods that verified this early behavior.

@Test
public void testString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals(expected, actual);
}
@Fix("testString")
public void fixString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals("actual", actual);
}

This is probably not what the developer would expect. Overwriting the expected side removes the use of the expected variable. This makes the test harder to understand and might cause a compiler warning since the variable is not used anywhere else. Such a repair could also cause other tests to fail if the assertion was located in a utility method called from multiple places. Indeed, this behavior confused several participants in the user study that Vilas and I performed to evaluate ReAssert.

I changed ReAssert such that it would instead replace the initial value of a variable used on the expected side of a failing assertion. Even though I wrote tests to verify this behavior prior to making the change, it still caused many tests to break. The example above broke, and it was necessary to update the @Fix method in the following way:

@Fix("testString")
public void fixString() {
  String expected = "actual";
  String actual = "actual";
  assertEquals(expected, actual);
}

Since I had the ReAssert plugin installed as per the second post in this series, I wanted it to automate repairs like the one above. Doing so proved challenging because the process was so self-referential: ReAssert used ReAssert's result to repair a test that (as per part one) triggered ReAssert. Don't worry if that sounds confusing because it is. The following diagram illustrates the process more clearly:

ReAsserting ReAssert

To avoid confusion, I will refer to the "upper" and "lower" instances of ReAssert. The upper instance is triggered when I tell the plugin to repair a failing @Test-and-@Fix method pair. The upper instance executes the test under JUnit, which—via FixChecker, my custom test runner—invokes the lower instance of ReAssert. The lower instance "repairs" the body of the @Test method and saves the result in memory. Finally the upper instance copies this result into the body of the @Fix method and outputs the repaired source code.

But what prevents the lower instance from introducing an infinite recursive loop? After all, the lower instance of ReAssert invokes JUnit, which runs the test with FixChecker, which repairs the test with ReAssert, which invokes JUnit, and so on. FixChecker breaks this loop by ensuring that only one instance of itself is active. This allows the lowermost instance of JUnit to execute @Test normally.

This experience with ReAssert reinforced my belief that meta-execution is an ideal way to test and improve software development tools. Not only does the developer discover bugs that would otherwise impact users, but executing a program on itself can indicate how easily one can extend the tool's behavior. In ReAssert's case, meta-execution not only uncovered many bugs but also led to several improvements in the internal design of the tool.

I think ReAssert's meta-repair capability is one of the most interesting aspects of the tool. Unfortunately, I didn't have room to describe it in the paper, which is why I wanted to write this series of weblog posts.

Testing a Testing Tool Part Two: Build and Deploy Local Eclipse Plugin

In my previous post, I discussed the first of several challenges I encountered when testing ReAssert. In this, the second of three articles in the series, I will describe how I automatically deployed the tool for my own use.

Build and Deploy Local Eclipse Plugin

When developing software tools, it is good practice to eat one's own dog food by using the tool oneself. It is one of the easiest ways to uncover bugs and improve usability. I ate ReAssert's dog food by using it to repair tests in other research projects and (as I'll describe in the next post) ReAssert itself.

ReAssert is implemented as an Eclipse plugin. The easiest way to deploy such a plugin is to include it in Eclipse's plugins directory1. When Eclipse starts, it automatically loads all plugins in the directory. If there is more than one version of a particular plugin, Eclipse loads the most recent.

To keep from having to install ReAssert's plugin manually, I made it such that the normal build process automatically updates version numbers and copies the plugin to the appropriate place. A single Ant build script handles the entire process.

The main challenge lies in accessing Eclipse's build metadata in the script. Each Eclipse plugin holds the metadata in a file called MANIFEST.MF. It contains things like the plugin's name, it's version number, and the other plugins it depends on. For example, here is part of ReAssert's:

Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: edu.illinois.reassert.plugin
Bundle-SymbolicName: edu.illinois.reassert.plugin;singleton:=true
Bundle-Version: 0.3.0.201002231921
Bundle-Activator: edu.illinois.reassert.plugin.ReAssertPlugin
Require-Bundle: org.eclipse.ui,
 org.eclipse.core.runtime,
 ...
Bundle-Vendor: University of Illinois at Urbana-Champaign

ReAssert's build script first updates this file with the build number (based on the current date and time). Then, it reads the version and build number by converting MANIFEST.MF into a Ant property file. Finally, it bundles the plugin using the values.

Here are the relevant pieces of the script that implements this process:

  1. Set the date and time with tstamp, then replace the build number in the manifest with the date and time.
    <tstamp /> <!-- set ${DSTAMP} and ${TSTAMP} -->
    <replaceregexp 
        file="META-INF/MANIFEST.MF" 
        match="Bundle\-Version: ([0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+)"
        replace="Bundle-Version: \1.${DSTAMP}${TSTAMP}" />
  2. Convert MANIFEST.MF into manifest.properties that Ant can read.2
    <copy file="META-INF/MANIFEST.MF" tofile="manifest.properties" />
    <replace file="manifest.properties">
      <replacefilter token=":=" value="=" />
      <replacefilter token=":" value="=" />
      <replacefilter token=";" value="" />
    </replace>
    <property file="manifest.properties"/>
  3. Set plugin name using the properties and bundle the plugin's JAR file.
    <property 
        name="plugin.jar" 
        value="${dist.dir}/${Bundle-Name}_${Bundle-Version}.jar" />
    <jar 
        destfile="${plugin.jar}"
        manifest="META-INF/MANIFEST.MF">
      <fileset dir="${bin.dir}" />
      <fileset dir="." includes="${lib.dir}/**/*" />
      <fileset dir="." includes="META-INF/MANIFEST.MF" />
      <fileset dir="." includes="plugin.xml" />
    </jar>
  4. Copy the JAR to Eclipse's plugins directory. The eclipse.home variable is set when the script is run within Eclipse.
    <copy file="${plugin.jar}" todir="${eclipse.home}/plugins/" />
    <echo>Restart Eclipse to enable plugin</echo>

It is a pretty straightforward process but useful to keep my installed version of ReAssert up to date.

I have also seen a similar process scaled up to an entire enterprise. The company had an automated nightly build process that would post the plugin to an Eclipse Update Site on the local intranet. Every morning developers would update to the latest plugin. Any bugs they found while using the tool went straight into the bug tracking system.

Notes

  1. One can also install plugins through Eclipse's Update Manager with a plugin update site, but I felt this was overkill for ReAssert since it was such a simple plugin.
  2. Adapted from this article describing how to build a simple plugin.

Testing a Testing Tool Part One: Tests as Test Inputs

I wrote ReAssert to make it easier to maintain unit tests. Ironically, I encountered several challenges when testing ReAssert itself. First, ReAssert acts on source code, so I created a test framework that made it easy to build input programs and check ReAssert's output. Second, I ate my own dogfood by deploying the tool on my local machine. Finally, I combined both aspects by ReAsserting ReAssert itself.

This is the first of what I expect to be three posts in which I discuss these challenges.

Tests as Test Inputs

ReAssert transforms source code. Given the source code of a failing unit test, it outputs a transformed test that passes. Testing program transformation tools is difficult because writing input programs, passing them to the tool, and checking the output requires a lot of effort. Developers often automate the process by saving inputs and expected outputs to the filesystem or including them as string literals in their unit tests. Tests pass the input file contents or string literal to the tool and then verify that the tool's output exactly matches the expected output.

Both approaches are exceptionally common but have several disadvantages. First, files make it difficult to debug failures, since it can be difficult to figure out which file(s) corresponds to which test(s). Second, strings can make test code very verbose, and one has to worry about linebreaks, escape characters, and character encoding. Strings are also opaque to the IDE, so they lack helpful features like syntax highlighting and automatic formatting. Finally, both approaches require that the tool's output exactly matches the expected output byte-for-byte, whitespace and all. Such strict matching is rarely necessary when checking source code and can make tests very fragile. As soon as one changes the tool's pretty-printer, it can break every test even if program contents remain the same (which is actually one of the problems that ReAssert aims to solve).

I wanted ReAssert's unit tests to make testing as simple as possible while avoiding the problems caused by input files or string literals. The solution I built relies on the fact that ReAssert acts on unit tests. The test itself serves as the input to ReAssert, and another method in the same test class represents the expected output. To implement this idea, I extended JUnit's default behavior with a custom test runner called FixChecker and a new @Fix method annotation.

Here is an example: say I want to test that ReAssert replaces the initial value of a string used in a failing assertion. The failing test would look something like the following:

@Test
public void testString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals(expected, actual);
}

ReAssert should repair the test by replacing the "expected" string with "actual". To verify this behavior, I create a second method annotated with @Fix whose body contains the expected repair.

@Fix("testString")
public void fixString() {
  String expected = "actual";
  String actual = "actual";
  assertEquals(expected, actual);
}

Then, I tell JUnit to use FixChecker by annotating the test class with JUnit's @RunWith annotation. FixChecker intercepts JUnit's normal result when a test fails. It then "repairs" the test and checks that the repair matches the body of the @Fix method. If not, or if no @Fix method exists, then the runner reports that the test fails. Otherwise, it reports that the test passes. In a sense, the @Test-and-@Fix method pair act like assertEquals with the repaired test on the actual side and the @Fix method on the expected side.

FixChecker's repairs do not change the source code directly. Instead, it holds the modified source code in memory and compares it against the parsed source code of the @Fix method. In this way, the comparison ignores differences in source code formatting, and one can use either qualified or unqualified class names. Also, since both the test and the @Fix are normal methods, they can reuse aspects of the surrounding test class, and both receive the full support of the IDE.

FixChecker also provides two other useful features. First, it is smart enough to ignore tests that pass and lack an @Fix method. Instead, it forwards them along to JUnit unchanged. This allows me to mix ReAssert tests with standard unit tests. Second, I can test when ReAssert is expected to fail by marking unrepairable tests with @Unfixable, another new annotation that FixChecker knows to look for.

@Test
@Unfixable
public void testIgnoreAssertFail() {
  fail();
}

Several aspects of this framework are applicable to other program transformation tools. It is often useful to use real application code when testing even when a tool does not act solely on unit tests. Also, custom test runners can be exceptionally powerful and allow one to tailor tests to many contexts.

In later posts, I'll describe how I automatically deployed ReAssert for my own use, and how I used the tool to repair its own unit tests.

ReAssert at ASE 2009

In my previous post, I wrote about ReAssert, the tool I built to automatically fix broken unit tests. Yesterday I received notification that the paper describing the tool got accepted to ASE 2009.

This is the same paper mentioned in my crunch time analysis and typography request.

Here is the (working) abstract:

Developers often change software in ways that cause tests to fail. When this occurs, developers must determine whether failures are caused by errors in the code under test or in the test code itself. In the latter case, developers must repair failing tests or remove them from the test suite. Fixing tests is time consuming but beneficial, since removing tests reduces a test suite's ability to detect regressions. Fortunately, simple program transformations can repair many failing tests automatically.

We present ReAssert, a novel technique and tool that suggests repairs to failing tests' code which cause the tests to pass. Examples include replacing literal values in tests, changing assertion methods, or replacing one assertion with several. If the developer chooses to apply the repairs, ReAssert modifies the code automatically. Our experiments show that ReAssert can repair many common test failures and that its suggested repairs correspond to developers' expectations.

The conference will be held in Auckland, New Zealand. I am excited to travel overseas to present my work. Vilas, my coauthor, and Yun-Young, my officemate, are both from New Zealand and are eager to visit home.

Update September 8, 2009

I have posted the final version of the paper and updated the ReAssert homepage.

Update November 30, 2009

The conference presentation went very well, and I got a great deal of insightful questions and feedback from other attendees. Here are the presentation slides. I told the story of Alice the software developer like in the previous ReAssert post. The presentation starts with a picture of Alice adapted (as per Creative Commons) from xkcd #662.

Alice the software developer. Adapted from xkcd #662 and used in my ReAssert presentation.

I had originally planned to draw a picture in my normal cartoon style, but decided instead to use something simpler. The xkcd picture turned out to be a good choice; it made the audience laugh, and one attendee mentioned Scott McCloud's assertion from Understanding Comics that a simple face causes the audience to identify themselves in a character.

ReAssert: Suggesting Repairs for Broken Unit Tests

For the past year or so, I have been researching how software tests fail and the ways in which developers fix the failures. There are many interesting problems within this general theme, but I have most recently focused on the following familiar scenario:

Alice is a developer a large software company. She works on the company's flagship product and spends over half of her time writing unit tests to verify her code and document her assumptions. She is not alone in this respect; the company requires that functional changes and bugfixes should have corresponding unit tests to prevent regressions. As a result, the product's unit test suite achieves exceptionally high coverage.

One day, the project manager informs Alice that a key requirement has changed. The changed requirement violates many assumptions encoded the test suite, so several dozen tests fail after Alice modifies the software. Now Alice has a choice: should she remove the failing tests since they no longer reflect the correct behavior of the software, or should she attempt to repair the tests, which would require tedious and time-consuming manual editing?

Developers often have to make a similar choice. When tests fail due to problems with test code rather than the system under test, it is undoubtedly beneficial to fix the broken tests, since removing tests reduces a test suite's ability to detect regressions. However, developers may not take the time to fix the broken tests. For example, while working on the refactoring paper, my colleagues and I found many of Eclipse's refactoring tests were either commented out, marked as ignored, or most bizarrely, bypassed using "if (true) return;".

To solve this problem, I have been exploring ways of reducing the effort required to fix broken unit tests. Doing so would make developers less fearful of "deep" changes, allow them to write more detailed tests, and most importantly, provide time for more important work.

As a first step toward this goal, I developed a tool called ReAssert that automatically suggests changes to test code that are sufficient to make tests pass. Earlier this week I released a public beta. I welcome anyone reading this to download it from the ReAssert project homepage and try it out. Please contact me if you have any comments, questions, ideas for improvement, or bug reports.