While the chip multiprocessor (CMP) has quickly become the predominant processor architecture, its continuing success largely depends on the parallelizability of complex programs. In the early 1990s great successes were obtained to extract parallelism from the inner loops of scientific computations. General-purpose programs, however, stayed out-of-reach due to the complexity of their control flow and data dependences. In this paper we present a tool to extract coarse-grain parallelism in the outer program loops, even in general-purpose programs, and helps the programmer to parallelize it. This coarse-grain parallelism can be exploited efficiently on multi-cores without additional hardware support.