Embedded processors are used in numerous devices executing dedicatedapplications. This setting makes it worthwhile to optimize the processorto the application it executes, in order to increase its power-efficiency.This paper proposes to enhance direct mappeddata caches with automatically tunedrandomized set index functions to achieve that goal.We show how randomization functions can be automatically generated andcompare them to traditional set-associative cachesin terms of performance and energy consumption.A 16kB randomized direct mapped cache consumes 22% less energy thana 2-way set-associative cache, while it is less than 3% slower.When the randomization function is made configurable (i.e., it can beadapted to the program), the additional reduction of conflictsoutweighs the added complexity of the hardware, provided there isa sufficient amount of conflict misses.