
Most Java virtual machines introduce non-determinism in the execution of the applications they execute. For example, the sampling mechanism to steer the optimisation system often uses time-base sampling which can cause two executions to yield different sample sets, as a consequence of which other methods may be chosen for optimisation. Clearly, this adds perturbation to any performance measurement a benchmarker conducts. Other influences can be the thread scheduling system, the garbage collector, etc.
As we have shown in 'Statistically Rigorous Java Performance Evaluation', published at OOPSLA 2007, there are many ways in which one can deal with this non-determinism. However, we advocate using a rigorous approach, since otherwise conclusions might be misleading or ever outright incorrect. A first step to get meaningful results is gathering sufficient experimental data, i.e., make sure enough runs of the benchmark have been executed.
JavaStats is a tool, written in Python that allows to run Java benchmarks automatically until a sufficiently narrow confidence interval is obtained for a given performance metric.
As JavaStats is written in Python, you should have a (2.4 or higher) Python version installed. Additionally, JavaStats uses the SciPy toolkit for its statistical analyses. This toolkit in turn requires the NumPy library. Download and install instructions for both SciPy and NumPy are available on their respective websites.
Basically, it suffices to set up the configuration file, and provide it as an argument to the main script: javastats -c <config file>. Here are all the options:
Usage: ./javastats.py -c config [options]
Options:
-c CONFIG, --config=CONFIG
The name of the configuration file to use
-s, --suites The benchmark suites you wish to run. There should be
a section [suite] in the configuration file for each
suite you specify here.
-m, --vms The virtual machines you wish to use. There should be
a section [vm] in the configuration file for each vm
you specify here. For example [jikesrvm] if you use
--vms jikesrvm.
-b, --benchmarks The benchmarks you wish to run, if you decide not to
run all those present in the given suites. Each
benchmark must be tagged with the suite name, i.e.,
<suite>:<benchmark>
-i, --inputs The input sets you wish to run, if you decide not to
run all those present in the configuration file for
each suite. Each input set must either be tagged with
a suite name or a benchmark name. The latter overrides
the former.
--startup
--steady
-p PERFORMANCE, --performance=PERFORMANCE
The performance class that provides the measurent
(e.g., /usr/bin/time) and the trace file analysis
methods.
-h, --help show this help message and exit
For example:
./javastats.py -c toolkit.conf --suites specjvm98 --vms sun --inputs specjvm98:s1 --benchmarks specjvm98:_213_javac
There are four mandatory sections in the configuration file: general, trace, performance and stats. Additionally, there should be a section for each virtual machine you wish to use, detailing its location, the name of the executable and the extra options or settings. The latter is required for e.g., the Jikes RVM, where you need to set up the environment properly (pointing to the correct build etc.)before running the VM. Further, a section is required per benchmark suite you wish to execute. These sections allow you to decribe a suite as a whole, or to describe individual benchmarks separately. As such, the configuration file is quite flexible.
The general format of a configuration file consists of a [section] line, followed by one or more fields of the form name:<space separated list>.
The execution uses the startup form, unless the steady option is specified
Using command line options, one can override the defaults given in the configuration file. The idea is that the configuration file lists everything you could run, however, quite often you might only be interested in running a specific subset of either benchmarks, virtual machines or input sets.
The goal of JavaStats is to continue executing the benchmark until, for a given performance metric, a decent confidence interval is obtained. Of course, just running a benchmark does not cut it, one needs to specify the performance metric, such as time. The specifics are detailed in a Python class, derived from the Performance class. Basically, a performance class implements two methods: acquire_command and get_performance_data.
This is the implementation of the TimePerformance class:
class TimePerformance(Performance):
re_realtime = re.compile("^real")
def acquire_command(self, java_command):
return "/usr/bin/time -p %s"%(java_command)
def get_performance_data(self, trace_filename):
f = file(trace_filename, 'r')
for line in f.readlines():
if self.sweep_line_for_error(line):
return None
if self.re_realtime.search(line):
return line.split(" ")[-1]
As is immedialtey clear, not much Python knowledge is required to both understand and write this. If course, the difficulty of writing the class largely depends on the application that delivers the desired performance metric.
If you have a file containing but the performance numbers (per line), you can use the simulation mode to determine hpw many experiments you require to reach a given confidence interval width.
You can download a gzipped tarball containing the javastats script, an example configuration file, and the mentioned example performance classes. Alternatively, you might pull the most recent version from a darcs repository. For the latter use the following command
darcs get --partial http://itkovian.net/darcs/javastats
If you use our technique or software, we kindly ask that you cite our OOPSLA 2007 paper. The bibtex entry is the following:
@article{1297033,
author = {Andy Georges and Dries Buytaert and Lieven Eeckhout},
title = {Statistically rigorous java performance evaluation},
journal = {SIGPLAN Not.},
volume = {42},
number = {10},
year = {2007},
issn = {0362-1340},
pages = {57--76},
doi = {http://doi.acm.org/10.1145/1297105.1297033},
publisher = {ACM},
address = {New York, NY, USA},
}