Microprocessor design is very time-consuming, due to the huge design space that needs to be explored in order to identify the optimal design. Detailed simulations, used to explore these design spaces, are fairly slow. For every design point of interest, several hundreds of billions of instructions per benchmark need to be simulated. Recently, statistical simulation was proposed to efficiently cull a huge design space. The basic idea of statistical simulation is to collect a number of program characteristics and to generate a synthetic trace from it. Simulating this synthetic trace is extremely fast as it contains a million instructions only. We improve the statistical simulation methodology by proposing accurate memory data flow models. We model load forwarding, delayed cache hits and correlation between cache misses based on path info. Our experiments using the SPECcpu2000 benchmarks show a substantial improvement upon current state-of-the-art statistical simulation methods.