Architectures and Compilers for Embedded Systems

Ever since their invention, the computing power of microprocessors has increased exponentially. According to Moore's Law, their speed doubles every 18 months. And such exponential growth is not only to be found when looking at computing power, but also characterizes other properties such as chip area, available transistors per chip, clock frequency, memory size, etc. It should be clear that trying to keep up with Moore's Law poses great challenges to computer architects and pushes the processor industry to its limits. The necessary investments to create and market a state-of-the-art processor are phenomenal, and will in the future only be a possibility for the very largest industry players, or consortia.

While in the past the maximum speed of a processor was limited by the available number of transistors, we have now reached a point where additional transistors no longer guarantee additional speed gain. The number of functional units already exceeds the average instruction level parallelism, and a further increase in the number of pipeline stages is only interesting if we are able to predict control transfer with an even higher accuracy; making the on-chip caches bigger will also slow them down, and massive speculative execution to reduce latency adds considerable pressure to the memory hierarchy that is already often the bottleneck of the system as it is.

To escape from this situation, several different alternatives are looked at. One can increase the instruction level parallelism by offering the processor an instruction mix of different execution threads (multithreading). As these are independent threads, the average parallelism in the instruction stream will increase. This solution is comparable with integrating a multiprocessor on one chip. The bandwidth to the memory hierarchy can be increased by replacing the memory bus by a crossbar switch. By connecting the crossbars of different computers through a network with high bandwidth and low latency (SCI, Myrinet, Gigabit Ethernet), one can design fast distributed systems. For certain applications, it could also be interesting to provide the processors with a limited amount of reconfigurability. Instead of a processor with dedicated instructions for image processing, communication applications, language support, etc, one can try to design processors that can be adapted to the application at hand. This means that the decision on the functionality of part of the transistors on the processors can be made by the programmer, and not by the computer architect.

Since the arrival of the RISC, a processor cannot function optimally without the aid of a compiler that generates quality code. A optimising compiler can easily reduce the execution time by half for certain programs. As processors grew more and more complex, the demands on the compiler have likewise increased. Many code optimisations were initially implemented as dynamic techniques (i.e. executed by the processor) to improve program execution (e.g. caches, pipelined execution, branch prediction, etc). As soon as became clear that the dynamic approach did not always provide the desired results, it became a task of the compiler to make appropriate changes to the code that would ensure an optimal dynamic execution (for this reason compilers are doing data and code layout, are adding prefetch instructions to help the cache, are doing instruction scheduling to optimally fill the pipeline and are generating branch hints to help the branch predictor).

From this evolution is should be clear that, as the processor hardware becomes more powerful, there is also an integration taking place of architecture design, techniques for parallel and distributed execution and code generation and code transformation. As such, designing a computer system is more than ever an interdisciplinary task. A second observation is the fact that the success of an hardware architecture is increasingly dependent on the software that must make sure that the hardware is optimally used. 'Optimal' can mean that (i) programs have to be as compact as possible (e.g. in smart cards); (ii) programs have to be as fast as possible (e.g. in scientific applications or in DSP applications); (iii) programs have to be as energy efficient as possible (e.g. in mobile applications).

The aim of the ACES research network is to group research teams working on different aspects of future computing systems in Flanders and surrounding regions, to give visibility to our domain by yearly organizing a couple of international events with world-class experts in Flanders, and to stimulate collaboration between the different teams.