The power efficiency of an HMCP heavily depends on the architecture of its processor cores. It is thus very important to choose it carefully. When comparing processing architectures for their use in a many-core platform, one must evaluate its IPC, but also its power and area. Precise power and area evaluations can only be done with real implementations. However, comparing processor implementations is a difficult task since the implementation specifities introduce interferences on the performances. This paper proposes a methodology that allows to realize precise comparisons of performance for different processor architectures. Using this methodology, it is possible to choose the best architecture for an HMCP targeting DSP applications. The methodology is based on the use of a common architural template to build the cores, and on the application of specific optimizations when relevant. In order to validate the methodology, three RISC cores are implemented: a single-issue core, and two VLIW processors with respectively 3 and 5 issues. The implemented cores are precisely compared on a set of DSP kernels.