Due to the rise of Chip multiprocessors (CMP??s) the amount of parallel computing power has increased significantly. This is in contrast with the fact that a lot of programs are sequential and cannot exploit these parallel resources, urging the need of developing new techniques to extractparallelism from sequential programs. For this we present a new profile based technique. It works in a non-speculative way, based on data dependencies between functions and finds large chunks of code to parallelize. To achieve this, we introduce the so called interprocedural data flow graph and the data sharing graph. To test our technique we used the bzip2 program from the SPECcpu2000 benchmark suite. Our mechanism could significantly speed up the compression part (3.74 times), with a global speedup of 2.45 on a quad processor system.