If you had asked me whether or not it was possible to get a 2x speedup for my LazySorted project by adding a single line of code, I would have told you "No way, substantial speedups can really only come from algorithm changes." But surprisingly, I was able to do so by adding a single line using the __builtin_prefetch function in GCC and Clang. Here's the story about how adding this got me a 2x speedup.
↧