In embedded systems, achieving good performances for signal processing applications is crucial for power management. Good compilation is required to have maximal use of the available processing capabilities. Compiling for communication-exposed architectures such as ADRES, TRIPS and Wavescalar is however a complex task. Dataflow graphs are mapped on execution unit grids in order to increase the instruction-level parallelism while minimizing communication. Complex algorithms and the large number of code optimizations make debugging hard for the developer. Moreover, iterative approaches are used to optimize the compiled code quality. This paper proposes to embed functional simulators in compilers in order to enable debugging and profiling-driven iterative compilation. Debugging of optimization passes is achieved by means of functional simulators, running the original code and the transformed code. Intermediate and output values results comparison allows to verify the correctness of the optimization pass. Using embedded simulators also allows to extract code and execution characteristics convenient for iterative compilation. We present the mechanisms required to control those simulators. A case study based on the TRIPS processor demonstrates the usefulness of our approach.