www.BrettDaniel.com

Beware Mutable Data Points (Updated)

In my previous post, I mentioned that one of the benefits of JUnit Theories is that they decouple test inputs (data points) from test implementation (Theories). However, this benefit comes at a price: since data points may be reused across several Theories, the way in which one defines mutable data points can cause surprising unexpected behavior.

To illustrate the problems that can occur with mutable data points, I will reuse the Counter example from the previous post. Counters are obviously mutable because calling increment increases a counter's value by one.

Say I have two Theories: incrementTheory, which is described in the previous post, and equalIncrementTheory, which verifies that two (un)equal counters remain (un)equal after incrementing both. Both theories mutate the counters passed in as arguments.

@Theory
public void incrementTheory(Counter toIncrement) {
    System.out.println("incrementTheory(" + toIncrement + ")");
    int oldValue = toIncrement.getValue();
    toIncrement.increment();
    int newValue = toIncrement.getValue();
    assertEquals(oldValue + 1, newValue);
}

@Theory
public void equalIncrementTheory(Counter c1, Counter c2) {
    System.out.println("equalIncrementTheory(" + c1 + ", " + c2 + ")");
    boolean wereEqual = c1.equals(c2);
    c1.increment();		
    c2.increment();
    assertEquals(wereEqual, c1.equals(c2));
}

There are four ways in which I can define data points that JUnit will pass to these Theories. In the previous post, I used the @DataPoints annotation to mark a method that returns an array of Counter objects. Each element of the array is a single data point. I could have alternately used the @DataPoint (no "s") annotation on a method that returns a single counter, or used either annotation to mark static fields in the same manner.

The following list shows each of these four alternatives, followed by the output from the print statements included in the theories shown above. In each case there are two data points of type Counter initialized to the values 0 and 5.

  1. @DataPoint on a field holding a single value
    @DataPoint 
    public static Counter ZERO = new Counter(0);
    @DataPoint 
    public static Counter FIVE = new Counter(5);
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  2. @DataPoints on a field holding an array
    @DataPoints
    public static Counter[] COUNTERS = {
        new Counter(0),
        new Counter(5),
    }; 
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  3. @DataPoint on a method that returns a single value
    @DataPoint
    public static Counter zero() {
        return new Counter(0);
    }
    @DataPoint
    public static Counter five() {
        return new Counter(5);
    }
    
    outputs
    incrementTheory(Counter(5))
    incrementTheory(Counter(0))
    equalIncrementTheory(Counter(5), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(0), Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    
  4. @DataPoints on a method that returns an array
    @DataPoints
    public static Counter[] counters() {
        return new Counter[] {
            new Counter(0),
            new Counter(5),
        };
    }
    
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    equalIncrementTheory(Counter(1), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(6), Counter(5))
    

Three things are surprising about this output. First, even though the alternatives start with the same data points, their output differs. Second, the data points started with values 0 and 5, but the values 1, 3, 4, 6, 7, and 8 all appear in various places. Indeed, in alternatives 1 and 2, the values 0 and 5 are never found in equalIncrementTheory! Third, alternative 3 is the only one that gives the "expected" output in which both theories are executed with all combinations of 0 and 5.

All of these issues are caused by mutable data points. One Theory execution (meaning a single call to incrementTheory or equalIncrementTheory) can affect later Theory executions.

To understand why this happens, it is necessary to examine the "lifespan" of each data point instance. Fields such as those in alternatives 1 and 2 are declared static and initialized when the class is loaded. Therefore, the data point instances held by the fields live across all theory executions. For alternative 3, JUnit calls the zero and five methods once before each theory execution, so each data point instance lives through only one method call. Alternative 4 produces arrays of data points that live across multiple executions of a single theory but are reinitialized after each element has been used in a theory at least once.

To make these distinctions more clear, the following pseudocode illustrates the entire test run for each alternative. It shows the points at which JUnit creates each data point instance (represented as variable definition), the datapoint values (shown in comments), how they are passed to Theories (represented as method calls), and where they are mutated (the increment calls).

  1. @DataPoint on a field holding a single value. Note that ZERO and FIVE live for the entire test run and are continually mutated.
    ZERO = new Counter(0)
    FIVE = new Counter(5)
    incrementTheory(ZERO) // 0 
        ZERO.increment() // 1
    incrementTheory(FIVE) // 5
        FIVE.increment() // 6
    equalIncrementTheory(ZERO, ZERO) // 1, 1
        ZERO.increment() // 2
        ZERO.increment() // 3
    equalIncrementTheory(ZERO, FIVE) // 3, 6
        ZERO.increment() // 4
        FIVE.increment() // 7
    equalIncrementTheory(FIVE, ZERO) // 7, 4
        FIVE.increment() // 8
        ZERO.increment() // 5
    equalIncrementTheory(FIVE, FIVE) // 8, 8
        FIVE.increment() // 9
        FIVE.increment() // 10
    
  2. @DataPoints on a field holding an array. Like the previous alternative, the COUNTERS array lives for the entire test run and its elements are continually mutated.
    COUNTERS = new Counter[] { new Counter(0), new Counter(5) }
    incrementTheory(COUNTERS[0]) // 0
        COUNTERS[0].increment() // 1
    incrementTheory(COUNTERS[1]) // 5
        COUNTERS[1].increment() // 6
    equalIncrementTheory(COUNTERS[0], COUNTERS[0]) // 1, 1
        COUNTERS[0].increment() // 2
        COUNTERS[0].increment() // 3
    equalIncrementTheory(COUNTERS[0], COUNTERS[1]) // 3, 6
        COUNTERS[0].increment() // 4
        COUNTERS[1].increment() // 7
    equalIncrementTheory(COUNTERS[1], COUNTERS[0]) // 7, 4
        COUNTERS[1].increment() // 8
        COUNTERS[0].increment() // 5
    equalIncrementTheory(COUNTERS[1], COUNTERS[1]) // 8, 8
        COUNTERS[1].increment() // 9
        COUNTERS[1].increment() // 10
    
  3. @DataPoint on a method that returns a single value. JUnit calls the zero and five methods once for each Theory argument. This is analogous to continually assigning temporary variables.
    tmp1 = five()
    incrementTheory(tmp1) // 5
        tmp1.increment() // 6
    tmp2 = zero()
    incrementTheory(tmp2) // 0
        tmp2.increment() // 1
    tmp3 = five()
    tmp4 = five()
    equalIncrementTheory(tmp3, tmp4) // 5, 5
        tmp3.increment() // 6
        tmp4.increment() // 6
    tmp5 = five()
    tmp6 = zero()
    equalIncrementTheory(tmp5, tmp6) // 5, 0
        tmp5.increment() // 6
        tmp6.increment() // 1
    tmp7 = zero()
    tmp8 = five()
    equalIncrementTheory(tmp7, tmp8) // 0, 5
        tmp7.increment() // 1
        tmp8.increment() // 6
    tmp9 = zero()
    tmp10 = zero()
    equalIncrementTheory(tmp9, tmp10) // 0, 0
        tmp9.increment() // 1
        tmp10.increment() // 1
    
  4. @DataPoints on a method that returns an array. JUnit calls the method once for each Theory argument, then loops through all combinations of the returned arrays' values. In the case of equalIncrementTheory, it reuses the left argument across two Theory executions while the right argument changes.
    tmp1 = counters() // { 0, 5 }
    incrementTheory(tmp1[0]) // 0
        tmp1[0].increment() // 1
    incrementTheory(tmp1[1]) // 5
        tmp1[1].increment() // 6
    tmp2 = counters() // { 0, 5 }
    tmp3 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[0], tmp3[0]) // 0, 0
        tmp2[0].increment() // 1
        tmp3[0].increment() // 1
    equalIncrementTheory(tmp2[0], tmp3[1]) // 1, 5
        tmp2[0].increment() // 2
        tmp3[1].increment() // 6
    tmp4 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[1], tmp4[0]) // 5, 0
        tmp2[1].increment() // 6
        tmp4[0].increment() // 1
    equalIncrementTheory(tmp2[1], tmp4[1]) // 6, 5
        tmp2[1].increment() // 7
        tmp4[1].increment() // 6
    

This behavior certainly violates The Principle of Least Astonishment, but is it necessarily undesirable? Theories should test behavior common to many data points, including those that may or may not have been mutated elsewhere, so perhaps it is a good thing that alternatives 1, 2, and 4 "create" data points that the user did not originally plan for. However, common practice dictates that unit tests should be independent of each other and deterministically repeatable. Alternatives 1, 2, and 4 violate both principles: one Theory execution can affect another, and since Theory ordering is nondeterministic, there is no guarantee that re-running a Theory will yield the same result.

Therefore, when writing Theories that operate on mutable data points, I most often use and recommend the third alternative. That way each Theory execution uses new data point instances, making Theories independent of each other and deterministic.

Update (September 29, 2009)

A reader emailed with the observation that the issues surrounding mutable versus immutable objects are not specific to JUnit. Immutable objects make a program easier to reason about since changing a value in one place cannot affect another place that reads the value. The same is true in JUnit: using immutable objects causes all four ways of defining data points to produce equivalent behavior. However, using immutable objects may not be feasible or preferable, so one must be aware of how mutable objects are initialized and used. As the above article shows, this task is more difficult with JUnit Theories since it is not always obvious when data points are (re)initialized and how they flow across multiple Theories.

No Comments

Leave a Comment

Allowed Tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>