In this paper we present an innovative hardware implementation of the H.264/AVC
CABAC binary arithmetic decoder and context modeler capable of decoding one
symbol per clock cycle at high clock frequencies while maintaining a slim
hardware footprint. This was achieved by substantially decreasing the latency
of the central feedback loop through extensive use of speculative prefetching
and aggressive pipelining. Actual synthesis results targeted at the
state-of-the-art FPGA families show that our approach results in a fast and
compact IP core, ideal for a SoC H.264/AVC implementation.