How much linear memory access is enough?
58 points - last Wednesday at 1:16 PM
SourceComments
For example, I wonder what this test looks like if you don't randomize the chunks but instead just have the chunks in work order? If you still see the perf hit, that suggests the cost is not from the cache misses but rather the overhead of needing to switch chunks more often.
Nevertheless itโs been a helpful rule of thumb to not overthink optimizations.
My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.
on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.
I need to add it to my TODO list to have a look at your github code...
It means if I'm doing very light processing (sums) I should try to move that to structure-of-arrays to take advantage of cache? But if I'm doing something very expensive, I can leave it as array-of-structures, since the computation will dominate the memory access in Amdahl's Law analysis?
This data should tell me something about organizing my data and accessing it, right?