In this paper we present an innovative hardware implementation of the H.264/AVC CABAC binary arithmetic decoder and context modeler capable of decoding one symbol per clock cycle at high clock frequencies while maintaining a slim hardware footprint. This was achieved by substantially decreasing the latency of the central feedback loop through extensive use of speculative prefetching and aggressive pipelining. Actual synthesis results targeted at the state-of-the-art FPGA families show that our approach results in a fast and compact IP core, ideal for a SoC H.264/AVC implementation.