As multicore chips scale to higher processor counts, communication between cores becomes more and more important. Indeed, when a single application is split up among multiple cores, which are connected through a relatively slow network, the amount of communication that is required will have an essential effect on performance. Therefore, if the application can be partitioned in such a way that communication between threads is minimised, or that placement on non-uniform networks can be performed with regards to communication, a significant performance boost can be obtained. But to do this effectively, communication streams inside the application must be known. In this paper, we introduce a profiling tool for Java that can measure data flows between methods. It constructs a communication graph, which combines a traditional call graph with data flow information. The overhead of profiling is brought down by a factor of 15 through the use of reservoir sampling. We prove that this can be done with a limited decrease in accuracy. This way, we can quickly estimate communication flows, which forms the critical information that allows an efficient communication-aware parallelisation to be made.