www.BrettDaniel.com

Testing a Testing Tool Part Three: ReAsserting ReAssert

This is the third in a series of posts in which I discuss the challenges I encountered when testing ReAssert. I already showed how I used tests as their own input and automatically deployed ReAssert for my own use. Here, I combine both aspects by demonstrating how ReAssert can repair its own unit tests.

ReAsserting ReAssert

Tests break when the system under test evolves in ways that invalidate the assumptions encoded in the tests. ReAssert addresses this problem by making it easier to update tests to reflect the changed behavior. Like any other complex piece of software, ReAssert itself has evolved, making it susceptible to the same problem that it attempts to solve. There have been several times in which a change to ReAssert broke its unit tests. It is natural to ask whether ReAssert could repair them.

Recall from the first post in this series that ReAssert's unit tests have two parts: a failing test method marked with the @Test annotation and its expected repair marked with the @Fix annotation. When such a test breaks, it means that the @Fix method must change to reflect ReAssert's actual output.

Here is a real example of one time that ReAssert's evolution caused tests to break. An early version of ReAssert lacked the ability to trace an expected value back to its declaration. Instead, ReAssert would simply overwrite the expected side of a failing assertion. The following code (similar to the example used in the first post) shows the @Test and @Fix methods that verified this early behavior.

@Test
public void testString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals(expected, actual);
}
@Fix("testString")
public void fixString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals("actual", actual);
}

This is probably not what the developer would expect. Overwriting the expected side removes the use of the expected variable. This makes the test harder to understand and might cause a compiler warning since the variable is not used anywhere else. Such a repair could also cause other tests to fail if the assertion was located in a utility method called from multiple places. Indeed, this behavior confused several participants in the user study that Vilas and I performed to evaluate ReAssert.

I changed ReAssert such that it would instead replace the initial value of a variable used on the expected side of a failing assertion. Even though I wrote tests to verify this behavior prior to making the change, it still caused many tests to break. The example above broke, and it was necessary to update the @Fix method in the following way:

@Fix("testString")
public void fixString() {
  String expected = "actual";
  String actual = "actual";
  assertEquals(expected, actual);
}

Since I had the ReAssert plugin installed as per the second post in this series, I wanted it to automate repairs like the one above. Doing so proved challenging because the process was so self-referential: ReAssert used ReAssert's result to repair a test that (as per part one) triggered ReAssert. Don't worry if that sounds confusing because it is. The following diagram illustrates the process more clearly:

ReAsserting ReAssert

To avoid confusion, I will refer to the "upper" and "lower" instances of ReAssert. The upper instance is triggered when I tell the plugin to repair a failing @Test-and-@Fix method pair. The upper instance executes the test under JUnit, which—via FixChecker, my custom test runner—invokes the lower instance of ReAssert. The lower instance "repairs" the body of the @Test method and saves the result in memory. Finally the upper instance copies this result into the body of the @Fix method and outputs the repaired source code.

But what prevents the lower instance from introducing an infinite recursive loop? After all, the lower instance of ReAssert invokes JUnit, which runs the test with FixChecker, which repairs the test with ReAssert, which invokes JUnit, and so on. FixChecker breaks this loop by ensuring that only one instance of itself is active. This allows the lowermost instance of JUnit to execute @Test normally.

This experience with ReAssert reinforced my belief that meta-execution is an ideal way to test and improve software development tools. Not only does the developer discover bugs that would otherwise impact users, but executing a program on itself can indicate how easily one can extend the tool's behavior. In ReAssert's case, meta-execution not only uncovered many bugs but also led to several improvements in the internal design of the tool.

I think ReAssert's meta-repair capability is one of the most interesting aspects of the tool. Unfortunately, I didn't have room to describe it in the paper, which is why I wanted to write this series of weblog posts.

Testing a Testing Tool Part One: Tests as Test Inputs

I wrote ReAssert to make it easier to maintain unit tests. Ironically, I encountered several challenges when testing ReAssert itself. First, ReAssert acts on source code, so I created a test framework that made it easy to build input programs and check ReAssert's output. Second, I ate my own dogfood by deploying the tool on my local machine. Finally, I combined both aspects by ReAsserting ReAssert itself.

This is the first of what I expect to be three posts in which I discuss these challenges.

Tests as Test Inputs

ReAssert transforms source code. Given the source code of a failing unit test, it outputs a transformed test that passes. Testing program transformation tools is difficult because writing input programs, passing them to the tool, and checking the output requires a lot of effort. Developers often automate the process by saving inputs and expected outputs to the filesystem or including them as string literals in their unit tests. Tests pass the input file contents or string literal to the tool and then verify that the tool's output exactly matches the expected output.

Both approaches are exceptionally common but have several disadvantages. First, files make it difficult to debug failures, since it can be difficult to figure out which file(s) corresponds to which test(s). Second, strings can make test code very verbose, and one has to worry about linebreaks, escape characters, and character encoding. Strings are also opaque to the IDE, so they lack helpful features like syntax highlighting and automatic formatting. Finally, both approaches require that the tool's output exactly matches the expected output byte-for-byte, whitespace and all. Such strict matching is rarely necessary when checking source code and can make tests very fragile. As soon as one changes the tool's pretty-printer, it can break every test even if program contents remain the same (which is actually one of the problems that ReAssert aims to solve).

I wanted ReAssert's unit tests to make testing as simple as possible while avoiding the problems caused by input files or string literals. The solution I built relies on the fact that ReAssert acts on unit tests. The test itself serves as the input to ReAssert, and another method in the same test class represents the expected output. To implement this idea, I extended JUnit's default behavior with a custom test runner called FixChecker and a new @Fix method annotation.

Here is an example: say I want to test that ReAssert replaces the initial value of a string used in a failing assertion. The failing test would look something like the following:

@Test
public void testString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals(expected, actual);
}

ReAssert should repair the test by replacing the "expected" string with "actual". To verify this behavior, I create a second method annotated with @Fix whose body contains the expected repair.

@Fix("testString")
public void fixString() {
  String expected = "actual";
  String actual = "actual";
  assertEquals(expected, actual);
}

Then, I tell JUnit to use FixChecker by annotating the test class with JUnit's @RunWith annotation. FixChecker intercepts JUnit's normal result when a test fails. It then "repairs" the test and checks that the repair matches the body of the @Fix method. If not, or if no @Fix method exists, then the runner reports that the test fails. Otherwise, it reports that the test passes. In a sense, the @Test-and-@Fix method pair act like assertEquals with the repaired test on the actual side and the @Fix method on the expected side.

FixChecker's repairs do not change the source code directly. Instead, it holds the modified source code in memory and compares it against the parsed source code of the @Fix method. In this way, the comparison ignores differences in source code formatting, and one can use either qualified or unqualified class names. Also, since both the test and the @Fix are normal methods, they can reuse aspects of the surrounding test class, and both receive the full support of the IDE.

FixChecker also provides two other useful features. First, it is smart enough to ignore tests that pass and lack an @Fix method. Instead, it forwards them along to JUnit unchanged. This allows me to mix ReAssert tests with standard unit tests. Second, I can test when ReAssert is expected to fail by marking unrepairable tests with @Unfixable, another new annotation that FixChecker knows to look for.

@Test
@Unfixable
public void testIgnoreAssertFail() {
  fail();
}

Several aspects of this framework are applicable to other program transformation tools. It is often useful to use real application code when testing even when a tool does not act solely on unit tests. Also, custom test runners can be exceptionally powerful and allow one to tailor tests to many contexts.

In later posts, I'll describe how I automatically deployed ReAssert for my own use, and how I used the tool to repair its own unit tests.

Beware Mutable Data Points (Updated)

In my previous post, I mentioned that one of the benefits of JUnit Theories is that they decouple test inputs (data points) from test implementation (Theories). However, this benefit comes at a price: since data points may be reused across several Theories, the way in which one defines mutable data points can cause surprising unexpected behavior.

To illustrate the problems that can occur with mutable data points, I will reuse the Counter example from the previous post. Counters are obviously mutable because calling increment increases a counter's value by one.

Say I have two Theories: incrementTheory, which is described in the previous post, and equalIncrementTheory, which verifies that two (un)equal counters remain (un)equal after incrementing both. Both theories mutate the counters passed in as arguments.

@Theory
public void incrementTheory(Counter toIncrement) {
    System.out.println("incrementTheory(" + toIncrement + ")");
    int oldValue = toIncrement.getValue();
    toIncrement.increment();
    int newValue = toIncrement.getValue();
    assertEquals(oldValue + 1, newValue);
}

@Theory
public void equalIncrementTheory(Counter c1, Counter c2) {
    System.out.println("equalIncrementTheory(" + c1 + ", " + c2 + ")");
    boolean wereEqual = c1.equals(c2);
    c1.increment();		
    c2.increment();
    assertEquals(wereEqual, c1.equals(c2));
}

There are four ways in which I can define data points that JUnit will pass to these Theories. In the previous post, I used the @DataPoints annotation to mark a method that returns an array of Counter objects. Each element of the array is a single data point. I could have alternately used the @DataPoint (no "s") annotation on a method that returns a single counter, or used either annotation to mark static fields in the same manner.

The following list shows each of these four alternatives, followed by the output from the print statements included in the theories shown above. In each case there are two data points of type Counter initialized to the values 0 and 5.

  1. @DataPoint on a field holding a single value
    @DataPoint 
    public static Counter ZERO = new Counter(0);
    @DataPoint 
    public static Counter FIVE = new Counter(5);
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  2. @DataPoints on a field holding an array
    @DataPoints
    public static Counter[] COUNTERS = {
        new Counter(0),
        new Counter(5),
    }; 
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  3. @DataPoint on a method that returns a single value
    @DataPoint
    public static Counter zero() {
        return new Counter(0);
    }
    @DataPoint
    public static Counter five() {
        return new Counter(5);
    }
    
    outputs
    incrementTheory(Counter(5))
    incrementTheory(Counter(0))
    equalIncrementTheory(Counter(5), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(0), Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    
  4. @DataPoints on a method that returns an array
    @DataPoints
    public static Counter[] counters() {
        return new Counter[] {
            new Counter(0),
            new Counter(5),
        };
    }
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    equalIncrementTheory(Counter(1), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(6), Counter(5))
    

Three things are surprising about this output. First, even though the alternatives start with the same data points, their output differs. Second, the data points started with values 0 and 5, but the values 1, 3, 4, 6, 7, and 8 all appear in various places. Indeed, in alternatives 1 and 2, the values 0 and 5 are never found in equalIncrementTheory! Third, alternative 3 is the only one that gives the "expected" output in which both theories are executed with all combinations of 0 and 5.

All of these issues are caused by mutable data points. One Theory execution (meaning a single call to incrementTheory or equalIncrementTheory) can affect later Theory executions.

To understand why this happens, it is necessary to examine the "lifespan" of each data point instance. Fields such as those in alternatives 1 and 2 are declared static and initialized when the class is loaded. Therefore, the data point instances held by the fields live across all theory executions. For alternative 3, JUnit calls the zero and five methods once before each theory execution, so each data point instance lives through only one method call. Alternative 4 produces arrays of data points that live across multiple executions of a single theory but are reinitialized after each element has been used in a theory at least once.

To make these distinctions more clear, the following pseudocode illustrates the entire test run for each alternative. It shows the points at which JUnit creates each data point instance (represented as variable definition), the datapoint values (shown in comments), how they are passed to Theories (represented as method calls), and where they are mutated (the increment calls).

  1. @DataPoint on a field holding a single value. Note that ZERO and FIVE live for the entire test run and are continually mutated.
    ZERO = new Counter(0)
    FIVE = new Counter(5)
    incrementTheory(ZERO) // 0 
        ZERO.increment() // 1
    incrementTheory(FIVE) // 5
        FIVE.increment() // 6
    equalIncrementTheory(ZERO, ZERO) // 1, 1
        ZERO.increment() // 2
        ZERO.increment() // 3
    equalIncrementTheory(ZERO, FIVE) // 3, 6
        ZERO.increment() // 4
        FIVE.increment() // 7
    equalIncrementTheory(FIVE, ZERO) // 7, 4
        FIVE.increment() // 8
        ZERO.increment() // 5
    equalIncrementTheory(FIVE, FIVE) // 8, 8
        FIVE.increment() // 9
        FIVE.increment() // 10
    
  2. @DataPoints on a field holding an array. Like the previous alternative, the COUNTERS array lives for the entire test run and its elements are continually mutated.
    COUNTERS = new Counter[] { new Counter(0), new Counter(5) }
    incrementTheory(COUNTERS[0]) // 0
        COUNTERS[0].increment() // 1
    incrementTheory(COUNTERS[1]) // 5
        COUNTERS[1].increment() // 6
    equalIncrementTheory(COUNTERS[0], COUNTERS[0]) // 1, 1
        COUNTERS[0].increment() // 2
        COUNTERS[0].increment() // 3
    equalIncrementTheory(COUNTERS[0], COUNTERS[1]) // 3, 6
        COUNTERS[0].increment() // 4
        COUNTERS[1].increment() // 7
    equalIncrementTheory(COUNTERS[1], COUNTERS[0]) // 7, 4
        COUNTERS[1].increment() // 8
        COUNTERS[0].increment() // 5
    equalIncrementTheory(COUNTERS[1], COUNTERS[1]) // 8, 8
        COUNTERS[1].increment() // 9
        COUNTERS[1].increment() // 10
    
  3. @DataPoint on a method that returns a single value. JUnit calls the zero and five methods once for each Theory argument. This is analogous to continually assigning temporary variables.
    tmp1 = five()
    incrementTheory(tmp1) // 5
        tmp1.increment() // 6
    tmp2 = zero()
    incrementTheory(tmp2) // 0
        tmp2.increment() // 1
    tmp3 = five()
    tmp4 = five()
    equalIncrementTheory(tmp3, tmp4) // 5, 5
        tmp3.increment() // 6
        tmp4.increment() // 6
    tmp5 = five()
    tmp6 = zero()
    equalIncrementTheory(tmp5, tmp6) // 5, 0
        tmp5.increment() // 6
        tmp6.increment() // 1
    tmp7 = zero()
    tmp8 = five()
    equalIncrementTheory(tmp7, tmp8) // 0, 5
        tmp7.increment() // 1
        tmp8.increment() // 6
    tmp9 = zero()
    tmp10 = zero()
    equalIncrementTheory(tmp9, tmp10) // 0, 0
        tmp9.increment() // 1
        tmp10.increment() // 1
    
  4. @DataPoints on a method that returns an array. JUnit calls the method once for each Theory argument, then loops through all combinations of the returned arrays' values. In the case of equalIncrementTheory, it reuses the left argument across two Theory executions while the right argument changes.
    tmp1 = counters() // { 0, 5 }
    incrementTheory(tmp1[0]) // 0
        tmp1[0].increment() // 1
    incrementTheory(tmp1[1]) // 5
        tmp1[1].increment() // 6
    tmp2 = counters() // { 0, 5 }
    tmp3 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[0], tmp3[0]) // 0, 0
        tmp2[0].increment() // 1
        tmp3[0].increment() // 1
    equalIncrementTheory(tmp2[0], tmp3[1]) // 1, 5
        tmp2[0].increment() // 2
        tmp3[1].increment() // 6
    tmp4 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[1], tmp4[0]) // 5, 0
        tmp2[1].increment() // 6
        tmp4[0].increment() // 1
    equalIncrementTheory(tmp2[1], tmp4[1]) // 6, 5
        tmp2[1].increment() // 7
        tmp4[1].increment() // 6
    

This behavior certainly violates The Principle of Least Astonishment, but is it necessarily undesirable? Theories should test behavior common to many data points, including those that may or may not have been mutated elsewhere, so perhaps it is a good thing that alternatives 1, 2, and 4 "create" data points that the user did not originally plan for. However, common practice dictates that unit tests should be independent of each other and deterministically repeatable. Alternatives 1, 2, and 4 violate both principles: one Theory execution can affect another, and since Theory ordering is nondeterministic, there is no guarantee that re-running a Theory will yield the same result.

Therefore, when writing Theories that operate on mutable data points, I most often use and recommend the third alternative. That way each Theory execution uses new data point instances, making Theories independent of each other and deterministic.

Update (September 29, 2009)

A reader emailed with the observation that the issues surrounding mutable versus immutable objects are not specific to JUnit. Immutable objects make a program easier to reason about since changing a value in one place cannot affect another place that reads the value. The same is true in JUnit: using immutable objects causes all four ways of defining data points to produce equivalent behavior. However, using immutable objects may not be feasible or preferable, so one must be aware of how mutable objects are initialized and used. As the above article shows, this task is more difficult with JUnit Theories since it is not always obvious when data points are (re)initialized and how they flow across multiple Theories.

JUnit Theories

This semester I am overseeing two undergraduate senior theses. The two students are working on a project involving JUnit Theories. Theories are a very useful feature of JUnit, but they have not been widely adopted since they are still experimental and not documented very extensively. The project's short-term goal is to address this problem by writing a suite of Theories for use as benchmarks in testing research. In the longer term, we hope to apply knowledge gained by writing Theories to other research projects and find areas in which Theories can be improved.

This post describes what Theories are and what they do. In future posts, I hope to write about why I find them interesting and how they enable more complex testing tasks.

"Theories in practice: Easy-to-write specifications that catch bugs" defines Theories in the following way1:

[Theories] are partial specifications of program behavior. Theories are written much like like test methods, but are universally quantified: all of the theory’s assertions must hold for all arguments that satisfy the assumptions...A theory can be viewed in several ways. It is a universally quantified ("for-all") assertion, as contrasted to an assertion about a specific datum. It is a generalization of a set of example-based tests. It is a (possibly partial) specification of the method under test.

To understand what this definition means, it is easiest to explain a simple example. Say we have a simple Counter class that allows one to increment an integer value every time the increment method is called.

public class Counter {
    private int value;

    public Counter(int init) {
        this.value = init;
    }

    public void increment() {
        value = value + 1;
    }
		
    public int getValue() {
        return value;
    }
}

We wish to test that incrementing always increases a counter's value by one. The standard way to test this functionality is to write an example-based unit test that creates a Counter, increments it a few times, and asserts that the incremented values are correct.

@Test 
public void testIncrement()	{
    Counter c = new Counter(3);
    c.increment();
    assertEquals(4, c.getValue());
    c.increment();
    assertEquals(5, c.getValue());
}

This is a useful test, but it only verifies that a single counter initialized to three is incremented correctly. It would be good to test additional counters initialized to many different values. Doing so using example-based testing requires multiple test methods or testing objects in a loop.

Theories provide an elegant alternative that complements example-based tests. From the test writer's point of view, Theories are just like normal unit tests but with one or more parameters.2 To test incremement, one can write a Theory that accepts a Counter object, increments it, then asserts that the value has increased by one. This is more general than an example-based test because it verifies a property common to all counters, regardless of how they were initialized. In the following code, the incrementTheory method implements such a Theory.

@RunWith(Theories.class)
public class CounterTheories {
   
    @DataPoints 
    public static Counter[] data() {
        return new Counter[] {
            new Counter(0),
            new Counter(1),
            new Counter(-1),
            new Counter(Integer.MIN_VALUE),
            new Counter(Integer.MAX_VALUE), // overflows when incremented
        };
    }

    @Theory
    public void incrementTheory(Counter toIncrement) {
        int oldValue = toIncrement.getValue();
        assumeTrue(Integer.MAX_VALUE != oldValue);
        toIncrement.increment();
        int newValue = toIncrement.getValue();
        assertEquals(oldValue + 1, newValue);
    }

    //... more theories 
}

The @RunWith(Theories.class) annotation tells JUnit that it should run all methods in the class that are annotated with @Theory. The @DataPoints annotation marks methods that return values that JUnit should supply to applicable Theories. At runtime, JUnit matches the values returned from data point methods or fields to appropriate Theory parameters.3 In the example above, it sees that Counter objects are produced by data and consumed by incrementTheory, so it executes incrementTheory once for each element in the array returned from data.

Theories decouple test inputs from test implementation. Data points are automatically reused across multiple theories (even in subclasses), making it easier to write new tests. Adding a data point often provides a value that a test writer may not have originally considered, thus improving all Theories that use the data point.

But certain data points may not be applicable to a particular Theory. The test writer describes what data points apply by using methods provided by the org.junit.Assume class. Assumptions are similar to normal assertions except they cause a Theory to skip certain data points rather than fail. In our example, incrementTheory should not increment a counter whose value is equal to Integer.MAX_VALUE since the value would overflow. Therefore, incrementTheory uses assumeTrue to check for this special case.

This summary and example briefly describes the basics of Theories but glosses over how Theories work internally and lacks practical advice like how to write "good" theories. I hope that the undergrads and future weblog posts can explore these topics and deeper research issues further.

Notes

  1. See also an early paper on Theories and the initial announcement of their inclusion as part of an experimental project called Popper.
  2. .NET offers a similar feature but calls it—perhaps more descriptively—parameterized unit tests. JUnit uses this term to mean test classes that are instantiated with input data.
  3. Internally, JUnit finds data points whose declared types derive from the declared types of Theory parameters. It does not currently box and unbox primitive data points. I submitted a simple patch that fixes the problem, but it has not yet been accepted.

Eclipse JUnit Music Box

I wrote an Eclipse plugin that turns Eclipse's built-in JUnit runner into a music box. The following video demonstrates the plugin:

Each test class is assigned one of seven chords in the key of C major. The assignment is deterministic, so a particular sequence of tests will play the same "song". Passing test methods play a pleasing arpeggio, while failing tests play an ugly dissonant chord. The time each test method takes to execute determines the speed of the music. If more than one test class runs, then the music resolves to the tonic at the end of the session.

Here is the plugin (including source code). To try it out, simply save the .jar file in Eclipse's plugins directory and restart Eclipse. I tested it in Eclipse version 3.4.0 running on JDK 6. The plugin requires MIDI, so if you do not hear any sound when running JUnit tests, your computer probably lacks an appropriate MIDI device or it is configured incorrectly. Try running this simple class to test your MIDI setup.

I am not the first to think of making JUnit play music. There is a Musical JUnit project on SourceForge, but it has not been updated in three years. It also uses prewritten samples, while mine produces sound programmatically.

Update Thursday, February 25, 2010

I posted the code on BitBucket.