C++, способы сравнить улучшений местности кэша?

Question

C++, способы сравнить улучшений местности кэша?

Вы можете использовать тег шаблона {% with %} для такого рода вещей.

{% with v.docs|first as first_doc %}{{ first_doc.id }}{% endwith %}

8

c++ performance pointers caching benchmarking

задан Joseph Garvin 16 June 2009 в 21:13

3 ответа

Другие вопросы по тегам:

c++ performance pointers caching benchmarking

Похожие вопросы:

score 2 · Answer 1

You could design a benchmark specifically to bust the cache. For instance, allocate the pointed-to data blocks such that they're all guaranteed to be on different cache lines (say, by using a custom memory allocator that pads allocations out to at least a few hundred bytes). Then repeatedly iterate over a number of objects too big to fit everything in even the L2 cache (very platform-dependent, since it depends on the number of lines in cache, but 1 million would cover most architectures and only require a few hundred meg RAM total).

This will give you an upper limit on the performance gain made by the change from X to Y. But it does it by degrading the performance of X down to below any likely real-world usage. And to prove your case you need a lower-limit estimate, not an upper-limit estimate. So I'm not sure you'd achieve much, unless you discover that even this worst case still makes no significant difference and you needn't bother with the optimization.

Even if you don't aim for theoretical worst-case performance of X, any benchmark designed to exceed the cache is just picking an arbitrary point of bad performance of X, and looking to see if Y is better. It's not far off rigging the benchmark to make Y look good. It really doesn't matter how your code performs in dodgy benchmarks, except maybe for the purposes of marketing ~~lies~~ literature.

The best way to observe the real-world difference in performance, is to measure a real-world client of your class. You say that "the semantics of X and Y differ in other subtle ways not related to this optimization", in which case I can only recommend that you write a class Z which differs from X only in respect of this optimization, and use that in your application as the comparison.

Once your tests attempt to represent the worst realistic use, then if you aren't seeing any difference in performance there's probably no performance gain to be had.

All that said, if it makes logical sense (that is, it doesn't make the code any more astonishing), then I would advocate minimising the number of heap allocations in C++ simply as a rule of thumb. It doesn't tend to make speed or total memory usage worse, and it does tend to simplify your resource handling. A rule of thumb doesn't justify a re-write of working code, of course.

score 8 · Answer 2

Если вы работаете в Linux, то использование Cachegrind в сочетании с KCacheGrind может дать больше информации о том, как ведет себя ваш кеш.

score 0 · Answer 3

If I'm understanding your situation correctly (and please correct me if not), then it's six of one, or half a dozen of the other.

In class X, you need one pointer lookup for either piece of information. In class Y, you need one lookup for the first, and two (get the first and then offset) for the second. That's sacrificing "locality" for another memory access. Compilers are still, unfortunately, very good at wasting bus time looking up words in RAM.

If it's possible, you'll get the best results by holding the two pieces of target information directly within the class in question (i.e. each it's own class member), rather than using those pointers for unnecessary indirection. Not seeing any code, that's pretty much all I can say.

At any rate, you'll get a lot more performance out of studying the algorithmic complexity of your application than you ever will with micro-optimizing two variables in a class definition. Also a great idea is to use a profiling tool to see (objectively) where your bottlenecks are (gprof is common on *nix systems). Is there a distinct reason you're looking to increase locality caching specifically?