While the number of cores in both general purpose chip-multiprocessors (CMPs) and embedded Multi-Processor Systems-on-Chip (MPSoCs) keeps rising, on-chip communication becomes more and more important. In order to write efficient programs for these architectures, it is therefore necessary to have a good idea of the communication behavior of an application. We present a communication profiler that extracts this behavior from compiled, sequential C/C++ programs, and constructs a dynamic data-flow graph at the level of major functional blocks. It can also be used to view differences between program phases, such as different video frames, which allows both input- and phase-specific optimizations to be made. Finally, PinComm can visualize inter-thread communication in parallel programs, which can help in optimizing communication behavior and spotting communication-related performance bottlenecks.