Most Java virtual machines introduce non-determinism in the execution of the applications they execute. For example, the sampling mechanism to steer the optimisation system often uses time-base sampling which can cause two executions to yield different sample sets, as a consequence of which other methods may be chosen for optimisation. Clearly, this adds perturbation to any performance measurement a benchmarker conducts. Other influences can be the thread scheduling system, the garbage collector, etc.

As we have shown in 'Statistically Rigorous Java Performance Evaluation', published at OOPSLA 2007, there are many ways in which one can deal with this non-determinism. However, we advocate using a rigorous approach, since otherwise conclusions might be misleading or ever outright incorrect. A first step to get meaningful results is gathering sufficient experimental data, i.e., make sure enough runs of the benchmark have been executed.

JavaStats is a tool, written in Python that allows to run Java benchmarks automatically until a sufficiently narrow confidence interval is obtained for a given performance metric.


As JavaStats is written in Python, you should have a (2.4 or higher) Python version installed. Additionally, JavaStats uses the SciPy toolkit for its statistical analyses. This toolkit in turn requires the NumPy library. Download and install instructions for both SciPy and NumPy are available on their respective websites.

JavaStats usage

Basically, it suffices to set up the configuration file, and provide it as an argument to the main script: javastats -c <config file>. Here are all the options:

Usage: ./ -c config [options]

  -c CONFIG, --config=CONFIG
                        The name of the configuration file to use
  -s, --suites          The benchmark suites you wish to run. There should be
                        a section [suite] in the configuration file for each
                        suite you specify here.
  -m, --vms             The virtual machines you wish to use. There should be
                        a section [vm] in the configuration file for each vm
                        you specify here. For example [jikesrvm] if you use
                        --vms jikesrvm.
  -b, --benchmarks      The benchmarks you wish to run, if you decide not to
                        run all those present in the given suites. Each
                        benchmark must be tagged with the suite name, i.e.,
  -i, --inputs          The input sets you wish to run, if you decide not to
                        run all those present in the configuration file for
                        each suite. Each input set must either be tagged with
                        a suite name or a benchmark name. The latter overrides
                        the former.
                        The performance class that provides the measurent
                        (e.g., /usr/bin/time) and the trace file analysis
  -h, --help            show this help message and exit

For example:

./ -c toolkit.conf --suites specjvm98 --vms sun  --inputs specjvm98:s1 --benchmarks specjvm98:_213_javac

Configuration file

There are four mandatory sections in the configuration file: general, trace, performance and stats. Additionally, there should be a section for each virtual machine you wish to use, detailing its location, the name of the executable and the extra options or settings. The latter is required for e.g., the Jikes RVM, where you need to set up the environment properly (pointing to the correct build etc.)before running the VM. Further, a section is required per benchmark suite you wish to execute. These sections allow you to decribe a suite as a whole, or to describe individual benchmarks separately. As such, the configuration file is quite flexible.

The general format of a configuration file consists of a [section] line, followed by one or more fields of the form name:<space separated list>.

  • The general section enlists the virtual machine sections and the benchmark sections that should be used in the experiments, if no command line options are given to delimit them. As such, it should have the following two fields: benchmarks suites and virtual machines. The items should be given in a space-separated list, e.g., specjvm98 dacapo specjbb2000
  • The trace section provides information regarding the trace files that will be produced for each experiment. Required are a machine (may be localhost), a location (i.e., a directory), and a filename prefix.
  • The performance section contains a single field, class, which provides the name of the Python class that will specify how to acquire the desired performance numbers, and how to extract that information from the resulting trace file. Two examples are given: TimePerformance and PerfexPerformance. The former uses /usr/bin/time, the latter the perfex binary from the perfctr package.
  • The stats section ...
  • Each virtual machine section should have the name as specified in the general, virtual machine field. The following fields are mandatory: binary location, binary name. Optional fields include: minumum heap option, maximum heap option. If specified, they must contain a placeholder %(min size), resp. %(max size), because JavaStats will need to fill in these values.
  • Similarly, each benchmark suite section should have the name as specified in the general, benchmark suites field. The following fields are mandatory: location, input sizes, startup command, steady command, benchmarks. The location specifies the base directory for the suite, i.e., from this location it should be able to execute the benchmarks given the appropriate startup command. The latter should contain placeholders for the input and for the benchmark.

The execution uses the startup form, unless the steady option is specified

Command line options

Using command line options, one can override the defaults given in the configuration file. The idea is that the configuration file lists everything you could run, however, quite often you might only be interested in running a specific subset of either benchmarks, virtual machines or input sets.

  • the --suites option: run the specified suites completely, unless limited by other options
  • the --vms option: use only these virtual machines to run the experiments
  • the --benchmarks option: you must tag each benchmark specified here with the suite it belongs to, e.g., dacapo:antlr. Only these benchmarks will be executed (on all virtual machines specified, or when not specified, on all those in the configuration file, and with all input sets)
  • the --inputs option: you must tag each input with either the suite or the benchmark it should be used for. If you specify both a suite and a benchmark belonging to it, the latter has a higher precedence.

Performance classes

The goal of JavaStats is to continue executing the benchmark until, for a given performance metric, a decent confidence interval is obtained. Of course, just running a benchmark does not cut it, one needs to specify the performance metric, such as time. The specifics are detailed in a Python class, derived from the Performance class. Basically, a performance class implements two methods: acquire_command and get_performance_data.

  • acquire_command(java command) will transform the command that executes the java benchmark into a command that also gathers performance information. For example the TimePerformance class changes the <java command> into /usr/bin/time -p <java_command>.
  • get_performance_data(trace file name) will parse the trace file, containing the output of the transformed command and return the value for the desired performance metric. While parsing, the trace file is checked for regular expressions that indicate something has gone wrong. If one of them match, the None value should be returned. This value ignored by JavaStats. When it occurs in over 1/3 of the cases, or in 3 subsequent runs, the execution is halted and the user is notified of this problem.


This is the implementation of the TimePerformance class:

class TimePerformance(Performance):
  re_realtime  = re.compile("^real")

  def acquire_command(self, java_command):
    return "/usr/bin/time -p %s"%(java_command)

  def get_performance_data(self, trace_filename):

    f = file(trace_filename, 'r')
    for line in f.readlines():
      if self.sweep_line_for_error(line):
        return None
        return line.split(" ")[-1]

As is immedialtey clear, not much Python knowledge is required to both understand and write this. If course, the difficulty of writing the class largely depends on the application that delivers the desired performance metric.

Simulation mode

If you have a file containing but the performance numbers (per line), you can use the simulation mode to determine hpw many experiments you require to reach a given confidence interval width.

Getting the software

You can download a gzipped tarball containing the javastats script, an example configuration file, and the mentioned example performance classes. Alternatively, you might pull the most recent version from a darcs repository. For the latter use the following command

darcs get --partial

Statistically Rigorous Java Performance Evaluation

If you use our technique or software, we kindly ask that you cite our OOPSLA 2007 paper. The bibtex entry is the following:

 author = {Andy Georges and Dries Buytaert and Lieven Eeckhout},
 title = {Statistically rigorous java performance evaluation},
 journal = {SIGPLAN Not.},
 volume = {42},
 number = {10},
 year = {2007},
 issn = {0362-1340},
 pages = {57--76},
 doi = {},
 publisher = {ACM},
 address = {New York, NY, USA},