www.BrettDaniel.com

Testing a Testing Tool (Part Two)

In my previous post, I discussed the first of several challenges I encountered when testing ReAssert. In this, the second of three articles in the series, I will describe how I automatically deployed the tool for my own use.

Build and Deploy Local Eclipse Plugin

When developing software tools, it is good practice to eat one's own dog food by using the tool oneself. It is one of the easiest ways to uncover bugs and improve usability. I ate ReAssert's dog food by using it to repair tests in other research projects and (as I'll describe in the next post) ReAssert itself.

ReAssert is implemented as an Eclipse plugin. The easiest way to deploy such a plugin is to include it in Eclipse's plugins directory1. When Eclipse starts, it automatically loads all plugins in the directory. If there is more than one version of a particular plugin, Eclipse loads the most recent.

To keep from having to install ReAssert's plugin manually, I made it such that the normal build process automatically updates version numbers and copies the plugin to the appropriate place. A single Ant build script handles the entire process.

The main challenge lies in accessing Eclipse's build metadata in the script. Each Eclipse plugin holds the metadata in a file called MANIFEST.MF. It contains things like the plugin's name, it's version number, and the other plugins it depends on. For example, here is part of ReAssert's:

Manifest-Version: 1.0
Bundle-ManifestVersion: 2
Bundle-Name: edu.illinois.reassert.plugin
Bundle-SymbolicName: edu.illinois.reassert.plugin;singleton:=true
Bundle-Version: 0.3.0.201002231921
Bundle-Activator: edu.illinois.reassert.plugin.ReAssertPlugin
Require-Bundle: org.eclipse.ui,
 org.eclipse.core.runtime,
 ...
Bundle-Vendor: University of Illinois at Urbana-Champaign

ReAssert's build script first updates this file with the build number (based on the current date and time). Then, it reads the version and build number by converting MANIFEST.MF into a Ant property file. Finally, it bundles the plugin using the values.

Here are the relevant pieces of the script that implements this process:

  1. Set the date and time with tstamp, then replace the build number in the manifest with the date and time.
    <tstamp /> <!-- set ${DSTAMP} and ${TSTAMP} -->
    <replaceregexp 
        file="META-INF/MANIFEST.MF" 
        match="Bundle\-Version: ([0-9]+\.[0-9]+\.[0-9]+)\.([0-9]+)"
        replace="Bundle-Version: \1.${DSTAMP}${TSTAMP}" />
  2. Convert MANIFEST.MF into manifest.properties that Ant can read.2
    <copy file="META-INF/MANIFEST.MF" tofile="manifest.properties" />
    <replace file="manifest.properties">
      <replacefilter token=":=" value="=" />
      <replacefilter token=":" value="=" />
      <replacefilter token=";" value="" />
    </replace>
    <property file="manifest.properties"/>
  3. Set plugin name using the properties and bundle the plugin's JAR file.
    <property 
        name="plugin.jar" 
        value="${dist.dir}/${Bundle-Name}_${Bundle-Version}.jar" />
    <jar 
        destfile="${plugin.jar}"
        manifest="META-INF/MANIFEST.MF">
      <fileset dir="${bin.dir}" />
      <fileset dir="." includes="${lib.dir}/**/*" />
      <fileset dir="." includes="META-INF/MANIFEST.MF" />
      <fileset dir="." includes="plugin.xml" />
    </jar>
  4. Copy the JAR to Eclipse's plugins directory. The eclipse.home variable is set when the script is run within Eclipse.
    <copy file="${plugin.jar}" todir="${eclipse.home}/plugins/" />
    <echo>Restart Eclipse to enable plugin</echo>

It is a pretty straightforward process but useful to keep my installed version of ReAssert up to date.

I have also seen a similar process scaled up to an entire enterprise. The company had an automated nightly build process that would post the plugin to an Eclipse Update Site on the local intranet. Every morning developers would update to the latest plugin. Any bugs they found while using the tool went straight into the bug tracking system.

Notes

  1. One can also install plugins through Eclipse's Update Manager with a plugin update site, but I felt this was overkill for ReAssert since it was such a simple plugin.
  2. Adapted from this article describing how to build a simple plugin.

Testing a Testing Tool (Part One)

I wrote ReAssert to make it easier to maintain unit tests. Ironically, I encountered several challenges when testing ReAssert itself. First, ReAssert acts on source code, so I created a test framework that made it easy to build input programs and check ReAssert's output. Second, I ate my own dogfood by deploying the tool on my local machine. Finally, I combined both aspects by ReAsserting ReAssert itself.

This is the first of what I expect to be three posts in which I discuss these challenges.

Tests as Test Inputs

ReAssert transforms source code. Given the source code of a failing unit test, it outputs a transformed test that passes. Testing program transformation tools is difficult because writing input programs, passing them to the tool, and checking the output requires a lot of effort. Developers often automate the process by saving inputs and expected outputs to the filesystem or including them as string literals in their unit tests. Tests pass the input file contents or string literal to the tool and then verify that the tool's output exactly matches the expected output.

Both approaches are exceptionally common but have several disadvantages. First, files make it difficult to debug failures, since it can be difficult to figure out which file(s) corresponds to which test(s). Second, strings can make test code very verbose, and one has to worry about linebreaks, escape characters, and character encoding. Strings are also opaque to the IDE, so they lack helpful features like syntax highlighting and automatic formatting. Finally, both approaches require that the tool's output exactly matches the expected output byte-for-byte, whitespace and all. Such strict matching is rarely necessary when checking source code and can make tests very fragile. As soon as one changes the tool's pretty-printer, it can break every test even if program contents remain the same (which is actually one of the problems that ReAssert aims to solve).

I wanted ReAssert's unit tests to make testing as simple as possible while avoiding the problems caused by input files or string literals. The solution I built relies on the fact that ReAssert acts on unit tests. The test itself serves as the input to ReAssert, and another method in the same test class represents the expected output. To implement this idea, I extended JUnit's default behavior with a custom test runner called FixChecker and a new @Fix method annotation.

Here is an example: say I want to test that ReAssert replaces the initial value of a string used in a failing assertion. The failing test would look something like the following:

@Test
public void testString() {
  String expected = "expected";
  String actual = "actual";
  assertEquals(expected, actual);
}

ReAssert should repair the test by replacing the "expected" string with "actual". To verify this behavior, I create a second method annotated with @Fix whose body contains the expected repair.

@Fix("testString")
public void fixString() {
  String expected = "actual";
  String actual = "actual";
  assertEquals(expected, actual);
}

Then, I tell JUnit to use FixChecker by annotating the test class with JUnit's @RunWith annotation. FixChecker intercepts JUnit's normal result when a test fails. It then "repairs" the test and checks that the repair matches the body of the @Fix method. If not, or if no @Fix method exists, then the runner reports that the test fails. Otherwise, it reports that the test passes. In a sense, the @Test-and-@Fix method pair act like assertEquals with the repaired test on the actual side and the @Fix method on the expected side.

FixChecker's repairs do not change the source code directly. Instead, it holds the modified source code in memory and compares it against the parsed source code of the @Fix method. In this way, the comparison ignores differences in source code formatting, and one can use either qualified or unqualified class names. Also, since both the test and the @Fix are normal methods, they can reuse aspects of the surrounding test class, and both receive the full support of the IDE.

FixChecker also provides two other useful features. First, it is smart enough to ignore tests that pass and lack an @Fix method. Instead, it forwards them along to JUnit unchanged. This allows me to mix ReAssert tests with standard unit tests. Second, I can test when ReAssert is expected to fail by marking unrepairable tests with @Unfixable, another new annotation that FixChecker knows to look for.

@Test
@Unfixable
public void testIgnoreAssertFail() {
  fail();
}

Several aspects of this framework are applicable to other program transformation tools. It is often useful to use real application code when testing even when a tool does not act solely on unit tests. Also, custom test runners can be exceptionally powerful and allow one to tailor tests to many contexts.

In later posts, I'll describe how I automatically deployed ReAssert for my own use, and how I used the tool to repair its own unit tests.

explain: Short Documentation for Unix Commands

Here is a scenario that may sound familiar to Linux users: say you want to do something in a terminal but don't know the appropriate command. A quick web search yields a number of possible solutions, but every one is a long, impenetrable command with several unexplained flags. One should avoid running unknown commands that may do something malicious (e.g. "rm -rf ~"), but it is difficult to search through each command's documentation to find the meaning of all of the flags.

I wrote a small utility program that addresses this problem. The utility, called explain, prints short documentation for a given command string, including the documentation for all flags passed to the command.

For example, say you want to copy a file only when it is newer than the destination file. This superuser.com thread suggests the following commands:

cp --update src dest
or
rsync -avz /from/where/ /to/dest/

explain shows what both of these commands actually mean:

$ explain cp --update src dest
cp - copy files and directories
    -u, --update
        copy only when the SOURCE file is  newer  than  the  destination
        file or when the destination file is missing
$ explain rsync -avz /from/where/ /to/dest/
rsync - copying tool
    -a, --archive
        This  is equivalent to -rlptgoD. It is a quick way of saying you
        want recursion and want to preserve almost everything  (with  -H
        being  a  notable  omission).   The  only exception to the above
        equivalence is when --files-from is specified, in which case  -r
        is not implied.
    -v, --verbose
        This  option increases the amount of information the daemon logs
        during its startup phase.  After the client connects,  the  dae‐
        mon’s verbosity level will be controlled by the options that the
        client used and the "max verbosity" setting in the module’s con‐
        fig section.
    -z, --compress
        With  this  option, rsync compresses the file data as it is sent
        to the destination machine, which reduces  the  amount  of  data
        being transmitted — something that is useful over a slow connec‐
        tion.

This documentation is simpler than a command's full man page, provides a more focused view of only the flags that one cares about (including their synonyms), and makes it easier to compare different commands. Thus, explain is complementary to existing Unix utilities like man, whatis and apropos.

To get explain, you can download its source code or clone its Mercurial repository. The bundled INSTALL file contains installation instructions.

Update from the comments: internally explain parses the command's man(1) page and searches for sections that look like flag documentation. This is somewhat challenging because the file contains no semantic markup and there is no "standard" format, just ad-hoc suggestions on what it should contain. These two factors are probably part of the reason why a utility like explain doesn’t already exist.

explain currently handles just one command at a time. I hope to improve the utility by explaining more complex commands with sub-commands, pipes, and redirects. I would also like it to explain non-flag arguments such as the source and destination arguments in the examples above. Finally, certain common commands use nonstandard flags that explain could probably handle as special cases. For example, it would be good if explain could show documentation for commands like "java -Xmx1024M" or "chmod +x". If you have any other feature requests or if you find any bugs, please post them on the issues page.

Over the past week, I have used explain at least once per day. I hope others find it useful, too.

Beware Mutable Data Points (Updated)

In my previous post, I mentioned that one of the benefits of JUnit Theories is that they decouple test inputs (data points) from test implementation (Theories). However, this benefit comes at a price: since data points may be reused across several Theories, the way in which one defines mutable data points can cause surprising unexpected behavior.

To illustrate the problems that can occur with mutable data points, I will reuse the Counter example from the previous post. Counters are obviously mutable because calling increment increases a counter's value by one.

Say I have two Theories: incrementTheory, which is described in the previous post, and equalIncrementTheory, which verifies that two (un)equal counters remain (un)equal after incrementing both. Both theories mutate the counters passed in as arguments.

@Theory
public void incrementTheory(Counter toIncrement) {
    System.out.println("incrementTheory(" + toIncrement + ")");
    int oldValue = toIncrement.getValue();
    toIncrement.increment();
    int newValue = toIncrement.getValue();
    assertEquals(oldValue + 1, newValue);
}
 
@Theory
public void equalIncrementTheory(Counter c1, Counter c2) {
    System.out.println("equalIncrementTheory(" + c1 + ", " + c2 + ")");
    boolean wereEqual = c1.equals(c2);
    c1.increment();		
    c2.increment();
    assertEquals(wereEqual, c1.equals(c2));
}

There are four ways in which I can define data points that JUnit will pass to these Theories. In the previous post, I used the @DataPoints annotation to mark a method that returns an array of Counter objects. Each element of the array is a single data point. I could have alternately used the @DataPoint (no "s") annotation on a method that returns a single counter, or used either annotation to mark static fields in the same manner.

The following list shows each of these four alternatives, followed by the output from the print statements included in the theories shown above. In each case there are two data points of type Counter initialized to the values 0 and 5.

  1. @DataPoint on a field holding a single value
    @DataPoint 
    public static Counter ZERO = new Counter(0);
    @DataPoint 
    public static Counter FIVE = new Counter(5);
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  2. @DataPoints on a field holding an array
    @DataPoints
    public static Counter[] COUNTERS = {
        new Counter(0),
        new Counter(5),
    };
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(1), Counter(1))
    equalIncrementTheory(Counter(3), Counter(6))
    equalIncrementTheory(Counter(7), Counter(4))
    equalIncrementTheory(Counter(8), Counter(8))
    
  3. @DataPoint on a method that returns a single value
    @DataPoint
    public static Counter zero() {
        return new Counter(0);
    }
    @DataPoint
    public static Counter five() {
        return new Counter(5);
    }
    outputs
    incrementTheory(Counter(5))
    incrementTheory(Counter(0))
    equalIncrementTheory(Counter(5), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(0), Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    
  4. @DataPoints on a method that returns an array
    @DataPoints
    public static Counter[] counters() {
        return new Counter[] {
            new Counter(0),
            new Counter(5),
        };
    }
    outputs
    incrementTheory(Counter(0))
    incrementTheory(Counter(5))
    equalIncrementTheory(Counter(0), Counter(0))
    equalIncrementTheory(Counter(1), Counter(5))
    equalIncrementTheory(Counter(5), Counter(0))
    equalIncrementTheory(Counter(6), Counter(5))
    

Three things are surprising about this output. First, even though the alternatives start with the same data points, their output differs. Second, the data points started with values 0 and 5, but the values 1, 3, 4, 6, 7, and 8 all appear in various places. Indeed, in alternatives 1 and 2, the values 0 and 5 are never found in equalIncrementTheory! Third, alternative 3 is the only one that gives the "expected" output in which both theories are executed with all combinations of 0 and 5.

All of these issues are caused by mutable data points. One Theory execution (meaning a single call to incrementTheory or equalIncrementTheory) can affect later Theory executions.

To understand why this happens, it is necessary to examine the "lifespan" of each data point instance. Fields such as those in alternatives 1 and 2 are declared static and initialized when the class is loaded. Therefore, the data point instances held by the fields live across all theory executions. For alternative 3, JUnit calls the zero and five methods once before each theory execution, so each data point instance lives through only one method call. Alternative 4 produces arrays of data points that live across multiple executions of a single theory but are reinitialized after each element has been used in a theory at least once.

To make these distinctions more clear, the following pseudocode illustrates the entire test run for each alternative. It shows the points at which JUnit creates each data point instance (represented as variable definition), the datapoint values (shown in comments), how they are passed to Theories (represented as method calls), and where they are mutated (the increment calls).

  1. @DataPoint on a field holding a single value. Note that ZERO and FIVE live for the entire test run and are continually mutated.
    ZERO = new Counter(0)
    FIVE = new Counter(5)
    incrementTheory(ZERO) // 0 
        ZERO.increment() // 1
    incrementTheory(FIVE) // 5
        FIVE.increment() // 6
    equalIncrementTheory(ZERO, ZERO) // 1, 1
        ZERO.increment() // 2
        ZERO.increment() // 3
    equalIncrementTheory(ZERO, FIVE) // 3, 6
        ZERO.increment() // 4
        FIVE.increment() // 7
    equalIncrementTheory(FIVE, ZERO) // 7, 4
        FIVE.increment() // 8
        ZERO.increment() // 5
    equalIncrementTheory(FIVE, FIVE) // 8, 8
        FIVE.increment() // 9
        FIVE.increment() // 10
  2. @DataPoints on a field holding an array. Like the previous alternative, the COUNTERS array lives for the entire test run and its elements are continually mutated.
    COUNTERS = new Counter[] { new Counter(0), new Counter(5) }
    incrementTheory(COUNTERS[0]) // 0
        COUNTERS[0].increment() // 1
    incrementTheory(COUNTERS[1]) // 5
        COUNTERS[1].increment() // 6
    equalIncrementTheory(COUNTERS[0], COUNTERS[0]) // 1, 1
        COUNTERS[0].increment() // 2
        COUNTERS[0].increment() // 3
    equalIncrementTheory(COUNTERS[0], COUNTERS[1]) // 3, 6
        COUNTERS[0].increment() // 4
        COUNTERS[1].increment() // 7
    equalIncrementTheory(COUNTERS[1], COUNTERS[0]) // 7, 4
        COUNTERS[1].increment() // 8
        COUNTERS[0].increment() // 5
    equalIncrementTheory(COUNTERS[1], COUNTERS[1]) // 8, 8
        COUNTERS[1].increment() // 9
        COUNTERS[1].increment() // 10
  3. @DataPoint on a method that returns a single value. JUnit calls the zero and five methods once for each Theory argument. This is analogous to continually assigning temporary variables.
    tmp1 = five()
    incrementTheory(tmp1) // 5
        tmp1.increment() // 6
    tmp2 = zero()
    incrementTheory(tmp2) // 0
        tmp2.increment() // 1
    tmp3 = five()
    tmp4 = five()
    equalIncrementTheory(tmp3, tmp4) // 5, 5
        tmp3.increment() // 6
        tmp4.increment() // 6
    tmp5 = five()
    tmp6 = zero()
    equalIncrementTheory(tmp5, tmp6) // 5, 0
        tmp5.increment() // 6
        tmp6.increment() // 1
    tmp7 = zero()
    tmp8 = five()
    equalIncrementTheory(tmp7, tmp8) // 0, 5
        tmp7.increment() // 1
        tmp8.increment() // 6
    tmp9 = zero()
    tmp10 = zero()
    equalIncrementTheory(tmp9, tmp10) // 0, 0
        tmp9.increment() // 1
        tmp10.increment() // 1
  4. @DataPoints on a method that returns an array. JUnit calls the method once for each Theory argument, then loops through all combinations of the returned arrays' values. In the case of equalIncrementTheory, it reuses the left argument across two Theory executions while the right argument changes.
    tmp1 = counters() // { 0, 5 }
    incrementTheory(tmp1[0]) // 0
        tmp1[0].increment() // 1
    incrementTheory(tmp1[1]) // 5
        tmp1[1].increment() // 6
    tmp2 = counters() // { 0, 5 }
    tmp3 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[0], tmp3[0]) // 0, 0
        tmp2[0].increment() // 1
        tmp3[0].increment() // 1
    equalIncrementTheory(tmp2[0], tmp3[1]) // 1, 5
        tmp2[0].increment() // 2
        tmp3[1].increment() // 6
    tmp4 = counters() // { 0, 5 }
    equalIncrementTheory(tmp2[1], tmp4[0]) // 5, 0
        tmp2[1].increment() // 6
        tmp4[0].increment() // 1
    equalIncrementTheory(tmp2[1], tmp4[1]) // 6, 5
        tmp2[1].increment() // 7
        tmp4[1].increment() // 6

This behavior certainly violates The Principle of Least Astonishment, but is it necessarily undesirable? Theories should test behavior common to many data points, including those that may or may not have been mutated elsewhere, so perhaps it is a good thing that alternatives 1, 2, and 4 "create" data points that the user did not originally plan for. However, common practice dictates that unit tests should be independent of each other and deterministically repeatable. Alternatives 1, 2, and 4 violate both principles: one Theory execution can affect another, and since Theory ordering is nondeterministic, there is no guarantee that re-running a Theory will yield the same result.

Therefore, when writing Theories that operate on mutable data points, I most often use and recommend the third alternative. That way each Theory execution uses new data point instances, making Theories independent of each other and deterministic.

Update (September 29, 2009)

A reader emailed with the observation that the issues surrounding mutable versus immutable objects are not specific to JUnit. Immutable objects make a program easier to reason about since changing a value in one place cannot affect another place that reads the value. The same is true in JUnit: using immutable objects causes all four ways of defining data points to produce equivalent behavior. However, using immutable objects may not be feasible or preferable, so one must be aware of how mutable objects are initialized and used. As the above article shows, this task is more difficult with JUnit Theories since it is not always obvious when data points are (re)initialized and how they flow across multiple Theories.

JUnit Theories

This semester I am overseeing two undergraduate senior theses. The two students are working on a project involving JUnit Theories. Theories are a very useful feature of JUnit, but they have not been widely adopted since they are still experimental and not documented very extensively. The project's short-term goal is to address this problem by writing a suite of Theories for use as benchmarks in testing research. In the longer term, we hope to apply knowledge gained by writing Theories to other research projects and find areas in which Theories can be improved.

This post describes what Theories are and what they do. In future posts, I hope to write about why I find them interesting and how they enable more complex testing tasks.

"Theories in practice: Easy-to-write specifications that catch bugs" defines Theories in the following way1:

[Theories] are partial specifications of program behavior. Theories are written much like like test methods, but are universally quantified: all of the theory’s assertions must hold for all arguments that satisfy the assumptions...A theory can be viewed in several ways. It is a universally quantified ("for-all") assertion, as contrasted to an assertion about a specific datum. It is a generalization of a set of example-based tests. It is a (possibly partial) specification of the method under test.

To understand what this definition means, it is easiest to explain a simple example. Say we have a simple Counter class that allows one to increment an integer value every time the increment method is called.

public class Counter {
    private int value;
 
    public Counter(int init) {
        this.value = init;
    }
 
    public void increment() {
        value = value + 1;
    }
 
    public int getValue() {
        return value;
    }
}

We wish to test that incrementing always increases a counter's value by one. The standard way to test this functionality is to write an example-based unit test that creates a Counter, increments it a few times, and asserts that the incremented values are correct.

@Test 
public void testIncrement()	{
    Counter c = new Counter(3);
    c.increment();
    assertEquals(4, c.getValue());
    c.increment();
    assertEquals(5, c.getValue());
}

This is a useful test, but it only verifies that a single counter initialized to three is incremented correctly. It would be good to test additional counters initialized to many different values. Doing so using example-based testing requires multiple test methods or testing objects in a loop.

Theories provide an elegant alternative that complements example-based tests. From the test writer's point of view, Theories are just like normal unit tests but with one or more parameters.2 To test incremement, one can write a Theory that accepts a Counter object, increments it, then asserts that the value has increased by one. This is more general than an example-based test because it verifies a property common to all counters, regardless of how they were initialized. In the following code, the incrementTheory method implements such a Theory.

@RunWith(Theories.class)
public class CounterTheories {
 
    @DataPoints 
    public static Counter[] data() {
        return new Counter[] {
            new Counter(0),
            new Counter(1),
            new Counter(-1),
            new Counter(Integer.MIN_VALUE),
            new Counter(Integer.MAX_VALUE), // overflows when incremented
        };
    }
 
    @Theory
    public void incrementTheory(Counter toIncrement) {
        int oldValue = toIncrement.getValue();
        assumeTrue(Integer.MAX_VALUE != oldValue);
        toIncrement.increment();
        int newValue = toIncrement.getValue();
        assertEquals(oldValue + 1, newValue);
    }
 
    //... more theories 
}

The @RunWith(Theories.class) annotation tells JUnit that it should run all methods in the class that are annotated with @Theory. The @DataPoints annotation marks methods that return values that JUnit should supply to applicable Theories. At runtime, JUnit matches the values returned from data point methods or fields to appropriate Theory parameters.3 In the example above, it sees that Counter objects are produced by data and consumed by incrementTheory, so it executes incrementTheory once for each element in the array returned from data.

Theories decouple test inputs from test implementation. Data points are automatically reused across multiple theories (even in subclasses), making it easier to write new tests. Adding a data point often provides a value that a test writer may not have originally considered, thus improving all Theories that use the data point.

But certain data points may not be applicable to a particular Theory. The test writer describes what data points apply by using methods provided by the org.junit.Assume class. Assumptions are similar to normal assertions except they cause a Theory to skip certain data points rather than fail. In our example, incrementTheory should not increment a counter whose value is equal to Integer.MAX_VALUE since the value would overflow. Therefore, incrementTheory uses assumeTrue to check for this special case.

This summary and example briefly describes the basics of Theories but glosses over how Theories work internally and lacks practical advice like how to write "good" theories. I hope that the undergrads and future weblog posts can explore these topics and deeper research issues further.

Notes

  1. See also an early paper on Theories and the initial announcement of their inclusion as part of an experimental project called Popper.
  2. .NET offers a similar feature but calls it—perhaps more descriptively—parameterized unit tests. JUnit uses this term to mean test classes that are instantiated with input data.
  3. Internally, JUnit finds data points whose declared types derive from the declared types of Theory parameters. It does not currently box and unbox primitive data points. I submitted a simple patch that fixes the problem, but it has not yet been accepted.

ReAssert: Suggesting Repairs for Broken Unit Tests

For the past year or so, I have been researching how software tests fail and the ways in which developers fix the failures. There are many interesting problems within this general theme, but I have most recently focused on the following familiar scenario:

Alice is a developer a large software company. She works on the company's flagship product and spends over half of her time writing unit tests to verify her code and document her assumptions. She is not alone in this respect; the company requires that functional changes and bugfixes should have corresponding unit tests to prevent regressions. As a result, the product's unit test suite achieves exceptionally high coverage.

One day, the project manager informs Alice that a key requirement has changed. The changed requirement violates many assumptions encoded the test suite, so several dozen tests fail after Alice modifies the software. Now Alice has a choice: should she remove the failing tests since they no longer reflect the correct behavior of the software, or should she attempt to repair the tests, which would require tedious and time-consuming manual editing?

Developers often have to make a similar choice. When tests fail due to problems with test code rather than the system under test, it is undoubtedly beneficial to fix the broken tests, since removing tests reduces a test suite's ability to detect regressions. However, developers may not take the time to fix the broken tests. For example, while working on the refactoring paper, my colleagues and I found many of Eclipse's refactoring tests were either commented out, marked as ignored, or most bizarrely, bypassed using "if (true) return;".

To solve this problem, I have been exploring ways of reducing the effort required to fix broken unit tests. Doing so would make developers less fearful of "deep" changes, allow them to write more detailed tests, and most importantly, provide time for more important work.

As a first step toward this goal, I developed a tool called ReAssert that automatically suggests changes to test code that are sufficient to make tests pass. Earlier this week I released a public beta. I welcome anyone reading this to download it from the ReAssert project homepage and try it out. Please contact me if you have any comments, questions, ideas for improvement, or bug reports.

Eclipse JUnit Music Box

I wrote an Eclipse plugin that turns Eclipse's built-in JUnit runner into a music box. The following video demonstrates the plugin:

Each test class is assigned one of seven chords in the key of C major. The assignment is deterministic, so a particular sequence of tests will play the same "song". Passing test methods play a pleasing arpeggio, while failing tests play an ugly dissonant chord. The time each test method takes to execute determines the speed of the music. If more than one test class runs, then the music resolves to the tonic at the end of the session.

Here is the plugin (including source code). To try it out, simply save the .jar file in Eclipse's plugins directory and restart Eclipse. I tested it in Eclipse version 3.4.0 running on JDK 6. The plugin requires MIDI, so if you do not hear any sound when running JUnit tests, your computer probably lacks an appropriate MIDI device or it is configured incorrectly. Try running this simple class to test your MIDI setup.

I am not the first to think of making JUnit play music. There is a Musical JUnit project on SourceForge, but it has not been updated in three years. It also uses prewritten samples, while mine produces sound programmatically.

Update Thursday, February 25, 2010

I posted the code on BitBucket.

java.util.Random’s Magic Number 0x5DEECE66D

Java's java.util.Random class is used to produce a sequence of pseudo-random numbers. As part of a recent recreational programming project, I needed to copy a given Random object such that both the original and copy produced the same sequence of values. While hacking together a quick-and-dirty solution, I learned some interesting facts about the implementation of the Random class.

The usual way to ensure that two Random objects produce the same sequence of values is to supply the same seed values to two new instances.

long seed = ...;
Random r1 = new Random(seed);
Random r2 = new Random(seed);

However, in this particular project I had no control over the instantiation of the Random object, and I could not access the original seed. Ideally, the Random class would provide a getSeed method that one could use to create a new object, but Java's implementation does not. This omission is particularly strange since the class stores the value but hides it in a private member variable.

The seed is stored as a java.util.concurrent.atomic.AtomicLong for thread safety. A new seed value is XOR'd with a constant multiplier and truncated to 48 bits before being stored.

public class Random ... {
    ...
    private final AtomicLong seed;
    private final static long multiplier = 0x5DEECE66DL;
    private final static long mask = (1L << 48) - 1;
    ...
    synchronized public void setSeed(long seed) {
        seed = (seed ^ multiplier) & mask;
        this.seed.set(seed);
        ...
    }
}

The class perturbs the seed in this way to increase the period of the generator. That is, Random will produce a sequence of values that does not repeat for a very long time. How long? This analysis by a University of Utah mathematics professor says, "the period of the returned values should be 2^48, which is better than the 2^32 of most 32-bit generators."

But why does it use the magic number 0x5DEECE66D? The analysis says it was chosen simply because researchers determined empirically that it produces a sequence of values satisfying various randomness tests [1, 2, 3]. Also, the same number and algorithm is used in C's nrand48 function.

The rand48() family of functions generates pseudo-random numbers using a linear congruential algorithm working on integers 48 bits in size. The particular formula employed is r(n+1) = (a * r(n) + c) mod m where the default values are for the multiplicand a = 0x5deece66d = 25214903917 and the addend c = 0xb = 11. The modulo is always fixed at m = 2 ** 48. r(n) is called the seed of the random number generator.

Thus, my project could XOR the field value with 0x5deece66d to recover the current seed.

It is easy to get the value of the private variable using reflection. (To be clear, this breaks the entire concept of encapsulation, and I emphatically discourage this kind of fragile reflection in production code. Nevertheless, it worked well enough for a small personal project.)

public long getCurrentSeed(Random random) throws Exception {
    // Access private field to get the seed
    Field seedField = random.getClass().getDeclaredField("seed");
    seedField.setAccessible(true);
    AtomicLong seedFieldValue = (AtomicLong) seedField.get(random);
    // Unperturb the seed from the magic multiplier
    return seedFieldValue.get() ^ 0x5DEECE66DL;	
}

Then, one can use this seed to create a copy of the original Random object.

Random original = ...
Random copy = new Random(getCurrentSeed(original));

Calling the next* methods on either object will produce the same sequence of values. Exactly what my project needed.

Probability of Two Submarines Colliding

Two nuclear submarines collided in the Atlantic, prompting my friend Sam to wonder, "what is the probability of two submarines colliding in the ocean?" Ignoring that we now know the true probability is 1, we can estimate the probability of collision by calculating the ratio of the volume of water through which the world's fleet of submarines travels in, say, 30 years to the total volume of the earth's oceans.

Fortunately, the news article, Wikipedia, and the rest of the internet have most of the values we need:

In 30 years (2.6×105 hours), one submarine will travel about 46 km/h × 2.6×105 h ≈ 1.2×107 km. That is about 8.5×107 times its own length. One submarine displaces 15,000 tons (1.36×10-5 km3) of water, so during that time it will travel through a volume of about 1.36×10-5 × 8.5×107 = 1,178 km3. With 250 such submarines, the total volume traveled comes to 2.94×105. The ratio of that to the volume of the world's oceans is 2.94×105 / 1.3×10^9 = 2.3×10-4.

Therefore, the probability that a fleet of 250 randomly-traveling Le Triomphant-sized submarines will have at least one collision in 30 years is around 1 in 5000.

This assumes that the average displacement of all subs deployed during those 30 years is roughly equivalent to 250 Le Triomphant class-sized subs. I have no idea if that is a valid assumption because I was unable to find a definite number for how many submarines are currently active, much less the year-to-year count or average size. Does anyone want to scrape Wikipedia's list of submarine classes to aggregate the average displacement and deployment numbers by year? Also, the number of submarines has decreased since the end of the cold war.

Check my math with the following Python script.

print "Q. What is the probability of two subs accidentally colliding in the ocean?"
print "   http://www.dailymail.co.uk/news/worldnews/article-1146124/"
 
def years_to_hours(years):
    days_per_year = 365.24
    hours_per_day = 24.0
    return years * days_per_year * hours_per_day
 
def tons_to_cubic_km(tons):
    lbs_per_ton = 2000.0
    liters_per_lb = 1.0 / 2.2 # of water
    cubic_km_per_liter = 1.0e-12
    return tons * lbs_per_ton * liters_per_lb * cubic_km_per_liter
 
ocean_volume = 1.3e9 #km^3  http://hypertextbook.com/facts/2001/SyedQadri.shtml
number_of_subs = 250.0 # http://en.wikipedia.org/wiki/Nuclear_navy
travel_time = years_to_hours(30.0)
 
# Submarine Characteristics
# http://en.wikipedia.org/wiki/Le_Triomphant_(S_616)
# http://en.wikipedia.org/wiki/HMS_Vanguard_(S28)
sub_displacement = tons_to_cubic_km(15000.0)
sub_speed = 46.0 #km/h
sub_length = 0.14 #km
 
# calculation
travel_distance = sub_speed * travel_time
volume_traversed = (travel_distance / sub_length) * sub_displacement
fleet_volume = number_of_subs * volume_traversed
ratio = fleet_volume / ocean_volume
 
print "A. %g" % ratio

Update

Lucas says the following on the mailing list on which I originally posted the calculation:

nice! I don't know if the analysis is right, but it gives us an upper bound at least!

Brett's calculation is equivalent to the following: set out 250 subs that randomly teleport to cover the right volume. None of the subs ever visits space that it or any other sub has been before. Now, what's the probability of choosing a spot that a sub has been if I were to randomly teleport a new sub into the ocean after 30 years?

I imagine this will over estimate a bit.

Since the total volume of the subs is much much smaller than the volume of the ocean, I think we can safely ignore "overlapping" yet non-colliding paths. Other than that, I think it is a good characterization.

Previously

Spoon Patch

Encountered a limitation of Spoon while using it on a research project; wrote a workaround; bundled the workaround into a patch.

Null Object for java.lang.reflect.Method

The Null Object Pattern is an object-oriented programming pattern that allows one to avoid null (i.e. undefined) values by instead creating objects that do nothing.

For example, say a library defines an interface ICommand with an execute method.

interface ICommand {
    void execute();
}

If one uses null to represent undefined commands, then one would always need to check that a value is non-null prior to use.

class CommandExecutor {
    ICommand command = null;
 
    void runCommand() {
        if (command != null) {
            command.execute();
        }
    }
}

Such checks are verbose and easy to forget. Instead, it is better to define a NullCommand class that implements ICommand and whose execute method does nothing.

class NullCommand implements ICommand {
    void execute() {
        // do nothing
    }
}

Then, one can initialize undefined ICommand values to a new NullCommand, removing the need for null checks.

class CommandExecutor {
    ICommand command = new NullCommand();
 
    void runCommand() {
        command.execute();
    }
}

One limitation of the pattern is that it requires access to the application's class hierarchy. What does one do when one needs to create a null object for a class that cannot be subclassed or constructed directly?

I recently encountered this problem while working on a research project. I was writing some reflective code that passed around java.lang.reflect.Method objects. The Method class is declared final and has a non-public constructor, meaning that I could not create a corresponding NullMethod class for undefined values. Nevertheless, for the reasons described above, I wanted to avoid using null. The solution for this particular problem involved creating reflective instances of a static dummy method that adheres to the signature expected by the application.

To see how this works, let us return to the example given at the beginning of the post. Say the application uses Method objects instead of ICommand objects. In this new example, the original CommandExecutor class would look something like the following:

class CommandExecutor {
    Method method = null;
 
    void runCommand(Object instance) throws Exception {
        if (method != null) {
            method.invoke(instance);
        }
    }
}

This example assumes that the Method objects take no arguments other than the instance in which to execute the method and that the return type does not matter. I will discuss how to handle more complex method signatures later, but for now, we know our null objects should adhere to this simple signature and do nothing when invoked.

The following is the updated code using null object pattern. UNDEFINED_METHOD_IMPL defines the body of the null method, and UNDEFINED_METHOD returns its reflective instance. This reflective instance is the null object that one can use to initialize undefined values such as the method field.

class CommandExecutor {
 
    public static Method UNDEFINED_METHOD() {
        try {
            return CommandExecutor.class.getDeclaredMethod(
                "UNDEFINED_METHOD_IMPL");
        } catch // removed for space
    }
    public static void UNDEFINED_METHOD_IMPL() {
        // do nothing
    }
 
    Method method = UNDEFINED_METHOD();
 
    void runCommand(Object instance) throws Exception {
    	method.invoke(instance);
    }
}

Because UNDEFINED_METHOD_IMPL is static, the invoke call ignores the instance variable. This means the method can be invoked on any type even though it is declared in CommandExecutor.

What if the expected method signature is more complex? For example, what if the application requires methods to accept one String argument and return an object of type ICommand?

class CommandExecutor {
    ...
    void runCommand(Object instance, String arg) throws Exception {
    	ICommand command = (ICommand) method.invoke(instance, arg);
    	command.execute();
    }
}

In this case, we simply change UNDEFINED_METHOD_IMPL such that it adheres to the new signature and returns a new NullCommand. Essentially, the null object (method) returns a null object (command).

class CommandExecutor {
 
    public static Method UNDEFINED_METHOD() {
        try {
            return CommandExecutor.class.getDeclaredMethod(
                "UNDEFINED_METHOD_IMPL", 
                String.class);
        } catch // removed for space
    }
    public static ICommand UNDEFINED_METHOD_IMPL(String arg) {
        return new NullCommand();
    }
 
    Method method = UNDEFINED_METHOD();
 
    void runCommand(Object instance, String arg) throws Exception {
    	ICommand command = (ICommand) method.invoke(instance, arg);
    	command.execute();
    }
}

Note that a reflective instance of a dummy method is not a true null value since its methods still execute normally and return "real" values. For example, UNDEFINED_METHOD().getDeclaringClass() still returns the java.lang.Class for CommandExecutor. Therefore, this approach works well when invoking methods, but not when an application requires deep reflective exploration of Method objects.

This Java file contains the working example code.

A good concise explanation of the Y Combinator

A good concise explanation of the Y Combinator

The Halting Problem, Succinctly

The Halting Problem on Twitter:

Write H(P): true if any program P halts; false otherwise. Write C(P) that halts iff H(P) is false. C(C) halts? Contradiction. Thus, no H(P).

...or as written by Dr. Seuss:

No program can say what another will do.

Now, I won't just assert that, I'll prove it to you:

I will prove that although you might work til you drop,

you can't predict whether a program will stop.

Multithreaded Programming

Threads, on the other hand, are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism. We have, of course, developed tools to assist in the pruning. Semaphores, monitors, and more modern overlays on threads offer the programmer ever more effective pruning. But pruning a wild mass of brambles rarely yields a satisfactory hedge.

To offer another analogy, suppose that we were to ask a mechanical engineer to design an internal combustion engine by starting with a pot of iron, hydrocarbon, and oxygen molecules, moving randomly according to thermal forces. The engineer’s job is to constrain these motions until the result is an internal combustion engine. Thermodynamics and chemistry tells us that this is a theoretically valid way to think about the design problem. But is it practical?

To offer a third analogy, a folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs.

From The Problem with Threads by Edward A. Lee.

Load, Modify, Recompile, and Execute Java Classes with Spoon

I have been using Spoon as part of some recent research. Spoon allows one to analyze and transform Java source code using a very elegant abstract syntax tree library and several parsing and compiling tools. Spoon assumes that one writes "processors" that act on all occurrences of a particular program element prior to compilation and execution. I required a different approach for my project. Rather than transforming code prior to compilation, I needed to load classes, modify their source code, recompile the changes, and execute the modified code at runtime. I found that there is very little online documentation describing this usage, so I posted my solution over on my research site.

Older Posts