Cheap and Secure Web Hosting Provider : See Now

[Solved]: Implementation of caches on CPUs with pipelines

, , No Comments
Problem Detail: 

I've read that some current CPUs (e.g. Intel i7 and ARM A9) have (L1) cache latencies of multiple clock cycles while also being pipelined. Some devote multiple pipeline stages to instruction fetching.

How is the L1 cache implemented to support pipelining?

Is it, say, $k$ actual stages for decoding the address or does it use some asynchronous multiport logic with a datapath latency set to $k$ clock cycles?

Asked By : 102948239408

Answered By : 102948239408

http://caps.cs.binghamton.edu/papers/aggarwal_ics_2005.pdf

I've found this paper which explains in the introduction that with higher associative caches the datapath to read data, compare tags and drive output becomes too long to use with high clock frequencies. Therefore the cache access is split into the 3 stages: decode set, compare tags and read, drive data. It goes further and explains that the access can have an extra stage added to avoid redundent reads of data and only read data after comparing and matching the correct tag, which improves power usage.

Although not really related to my question the paper also explains cache set and way prediction techniques to reduce the latency by parallelising address calculation and the cache read operation, the original paper refers to an article on the alpha 21264 microprocessor on how to implement the prediction. http://mixteco.utm.mx/~merg/AC/pdfs/alpha_21264.pdf

The technique is to reduce the latency by not waiting on the address calculation stages in the CPU pipeline. This is done by predicting the address of the data to be read and checking at the end of the operation.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/29584

3.2K people like this

 Download Related Notes/Documents

0 comments:

Post a Comment

Let us know your responses and feedback