In this paper we present a new hardware design pattern for improving memory transfers to external dynamic memory in Altera`s SOPC-builder tool by reusing the standard DMA IP core for all bulk memory transfers without the need for a CPU. The presented approach doubles the data throughput without the need for extra system resources. In addition it is more effective for choosing optimal clock settings for the different components of the system on a programmable chip. The benefits and limitations of this new approach are illustrated with a real world example: a bitplane assembler for scalable wavelet based video. The new design is 2.3 times faster with the same clock settings as the original design and uses about 100 logic elements less. Applying our new approach also has a positive impact on energy consumption.