Intelligent low-power devices such as portable phones, tablet computers, embedded systems and sensor networks require low-power solutions for high-performance applications. GPUs have a highly parallel multithreaded architecture and an efficient programming model, but are power-hungry. On the other hand field programmable gate arrays have a highly configurable parallel architecture and a substantially better energy efficiency, but are difficult to program. An approach is presented which maps the GPU architecture and programming model onto the configuration synthesis and the programming of FPGAs. Implementation details, benefits and trade-offs are discussed. In particular the architecture, memory and communication issues are addressed when porting a biomedical image application with a 20-fold GPU speedup onto an FPGA accelerator.