Search Unity

Is it guaranteed that random access within a 16KB chunk will not cause cache miss?

Discussion in 'Data Oriented Technology Stack' started by TMPxyz, Jul 14, 2019.

  1. TMPxyz

    TMPxyz

    Joined:
    Jul 20, 2012
    Posts:
    765
    My knowledge on the cpu arch was pretty out-dated, so I've made some quick research on cache. And here're something I learned:

    * Cache loads data in units of cache line, all cache line is 64bytes in L1-L3 (recent Intel cpus)
    * Sequence access and stride access would make cache to do prefetch so we can avoid cache miss even if we access more than one cache line

    And here're something I don't quite understand:

    * What's the point of 16KB chunk? Does that mean there would be no cache miss if we randomly access within a chunk? Or is it that there would still be (16KB/64=256) cache miss before we fill the whole chunk into the L1 cache?
     
  2. Lurking-Ninja

    Lurking-Ninja

    Joined:
    Jan 20, 2015
    Posts:
    4,263
    No, we are not aiming random access without cache miss. We are aiming to minimize the cache miss and try to process the data in sequence. This guarantees that the prefetch will load our next chunk of data so cache miss will be rare.
    No one promised no cache miss over random access AFAIK.
     
  3. Joachim_Ante

    Joachim_Ante

    Unity Technologies

    Joined:
    Mar 16, 2005
    Posts:
    4,670
    As Lurking-Ninja said, its not about guranteeing zero cachemiss. It is impossible to achieve that on a program as a whole, it is about massively reducing it and massively reducing the amount of bandwidth caused by loading memory into cache that is then not consumed.

    16KB means if you have for example 4 components on entity roughly 62 cache lines for each component per chunk laid out linearly. Thats a huge improvement over otherwise randomly jumping around in cache lines.

    The exact 16KB chunk size was determined empirically on megacity. Ultimately we do want it to be dynamic. At least introducing the concept of tiny & large chunks. We have definitely noticed that on both extremes (Huge amounts of entities, small amounts of entities of one archetype) efficiecency could be improved.
     
  4. TMPxyz

    TMPxyz

    Joined:
    Jul 20, 2012
    Posts:
    765
    Thanks for the clarification. There're still quite some points I don't quite understand about the cache.

    Just take the random access within a chunk in title as example, say if a IJobChunk randomly access the data in the chunk 100k times ( an iterative algorithm on graph maybe), is it mostly right that we would only have 256 cache miss during the job execution?

    I've another question about cache prefetch (strided prefetch) here.

    Assume we cache some data of 100K elements (each 128bytes) in an array. Is it right that if we loop to access an int member in each of the element, the cache will basically not miss as we are accessing address in a fixed stride?
     
  5. Ashkan_gc

    Ashkan_gc

    Joined:
    Aug 12, 2009
    Posts:
    916
    xVergilx likes this.