Você está na página 1de 1

Paper: The Impact of Delay on the Design of Branch Predictors

Summary:
Increasing complexity of the branch predictor and deeply pipelined microarchitectures lead to a delay
in branch prediction. The premise of the argument is based on studies that show that shrinking feature
sizes, larger wire delays and smaller clock cycles will lead to multi-cycle access times on larger chips.
This survey paper focusses on techniques that can be utilized to accommodate this delay. The authors
examine a caching approach, an overriding approach, and a cascading look-ahead approach. The
overriding approach uses a quick but relatively inaccurate predictor that guides instruction fetch in a
single cycle, which can be corrected by a slower but more accurate predictor that needs multiple
cycles. The cascading look-ahead scheme exploits the time between branches to start reading the
prediction tables. Different configurations are evaluated on a simulator, that simulates different
processor technologies (250nm to 35 nm) and determines the optimal parameters for each, and for
different predictors. They also present results for different clocking strategies. They demonstrate that
efficiency of a predictor relies on the accuracy as well as delay.
Strengths:
Provides insight into the effects of complex branch predictors on the delay in prediction.
Highlights the important tradeoff that really accurate complex predictor may still perform
worse than a faster less accurate predictor.
Provides us with experimentally determined configurations of pattern history tables for
different processor technologies.
Propose that overriding yields better performance than other delay hiding methods.
I think an important point also highlighted is how branch frequency affects latency in
prediction.
There is particularly good insight in to how processor technology and clocking affect the IPC.
The hybrid predictor achieves high accuracy but lowest IPC on smaller technologies as the
access times increase.
They show that the overriding scheme can work best across most processor technologies and
aggressive clocking.
Weaknesses:
They assume that the BTB is kept of constant capacity and access time. However, this may
not be true in the face of the clock rate impact and the nanometer technology.
Although they show that overriding works better than caching and cascading overall, but they
havent addressed its utility on very large hardware budgets i.e. in 100s of kilobytes. This
could possibly be because they only simulate
Related work:
Cited 148 times on Google Scholar.
This paper served as a foundation for the development of pipelined predictors that were
introduced in successive papers by Daniel Jimenez.
Andrew Seznec also then proposed the ahead pipelined architecture for branch prediction.

Você também pode gostar