This paper explores hardware specialization of low power processors to improve performance and energy efficiency. Our main contribution is an automated framework that analyzes instruction sequences of applications within a domain at the loop body level and identifies exactly and partially-matching sequences across applications that can become custom instructions. Our framework transforms sequences to a new code abstraction, a Merging Diagram, that improves similarity identification, clusters alike groups of potential custom instructions to effectively reduce the search space, and selects merged custom instructions to efficiently exploit the available customizable area. For a set of 11 media applications, our fast framework generates instructions that significantly improve the energy-delay product and speed up, achieving more than double the savings as compared to a technique analyzing sequences within basic blocks. This paper shows that partially-matched custom instructions, which do not significantly increase design time, are crucial to achieving higher energy efficiency at limited hardware areas.