The overhead in terms of code size, power consumption and execution time caused by the use of precompiled libraries and separate compilation is often unacceptable in the embedded world, where real-time constraints, battery life-time and production costs are of critical importance. In this paper we present our link-time optimizer for the ARM architecture. We discuss how we can deal with the peculiarities of the ARM architecture related to its visible program counter and how the introduced overhead can be eliminated to a large extent. Our link-time optimizer is evaluated in two tool chains. In the Arm Developer Suite tool chain, average code size reductions with 14.6% are achieved, while execution time is reduced with 8.3% on average, and energy consumption with 7.3%. On binaries from the GCC tool chain the average code size reduction is 16.6%, execution time is reduced with 12.3% and the energy consumption with 11.5% on average. Finally, we show how the incorporation of link-time optimization in tool chains may influence library interface design.