You Can't Fool the Optimizer

99 points - today at 12:14 PM

Comments

stabbles today at 1:13 PM

For people who enjoy these blogs, you would definitely like the Julia REPL as well. I used to play with this a lot to discover compiler things.

For example:

    $ julia
    julia> function f(n)
             total = 0
             for x in 1:n
               total += x
             end
             return total
           end
    julia> @code_native f(10)
        ...
        sub    x9, x0, #2
        mul    x10, x8, x9
        umulh    x8, x8, x9
        extr    x8, x8, x10, #1
        add    x8, x8, x0, lsl #1
        sub    x0, x8, #1
        ret
        ...

it shows this with nice colors right in the REPL.

In the example above, you see that LLVM figured out the arithmetic series and replaced the loop with a simple multiplication.

jagged-chisel today at 12:35 PM

I always code with the mindset “the compiler is smarter than me.” No need to twist my logic around attempting to squeeze performance out of the processor - write something understandable to humans, let the computer do what computers do.

matja today at 1:29 PM

You can fool the optimizer, but you have to work harder to do so:

    unsigned add(unsigned x, unsigned y) {
        unsigned a, b;
        do {
            a = x & y;
            b = x ^ y;
            x = a << 1;
            y = b;
        } while (a);
        return b;
    }

becomes (with armv8-a clang 21.1.0 -O3) :

    add(unsigned int, unsigned int):
    .LBB0_1:
            ands    w8, w0, w1
            eor     w1, w0, w1
            lsl     w0, w8, #1
            b.ne    .LBB0_1
            mov     w0, w1
            ret

Scene_Cast2 today at 1:11 PM

This post assumes C/C++ style business logic code.

Anything HPC will benefit from thinking about how things map onto hardware (or, in case of SQL, onto data structures).

I think way too few people use profilers. If your code is slow, profiling is the first tool you should reach for. Unfortunately, the state of profiling tools outside of NSight and Visual Studio (non-Code) is pretty disappointing.

toonewbie today at 12:35 PM

Sometimes you can fool the compiler :-)

See "Example 2: Tricking the compiler" in my blog post about O3 sometimes being slower than O2: https://barish.me/blog/cpp-o3-slower/

sureglymop today at 12:47 PM

With this one I instead wondered: If there are 4 functions doing exactly the same thing, couldn't the compiler also only generate the code for one of them?

E.g. if in `main` you called two different add functions, couldn't it optimize one of them away completely?

It probably shouldn't do that if you create a dynamic library that needs a symbol table but for an ELF binary it could, no? Why doesn't it do that?

torginus today at 1:46 PM

Awesome blog post - thanks to this I found out that you can view what the LLVM optimizer pipeline does, and which pass is actually responsible for doing which instruction.

It's super cool to see this in practice, and for me it helps putting more trust in the compiler that it does the right thing, rather than me trying to micro-optimize my code and peppering inline qualifiers everywhere.

317070 today at 12:43 PM

"The compiler" and "The optimizer" are doing a lot of the heavy lifting here in the argument. I definitely know compilers and optimizers which are not that great. Then again, they are not turning C++ code into ARM instructions.

You absolutely can fool a lot of compilers out there! And I am not only looking at you, NVCC.

Joker_vD today at 1:45 PM

Wait, why does GAS use Intel syntax for ARM instead of AT&T? Or something that looks very much like it: the destination is the first operand, not the last, and there is no "%" prefix for the register names?

amelius today at 12:47 PM

One undesirable property of optimizers is that in theory one day they produce good code and the next day they don't.

asah today at 1:11 PM

I want an AI optimization helper that recognizes patterns that could-almost be optimized if I gave it a little help, e.g. hints about usage, type, etc.

dlenski today at 1:54 PM

Today I learned that Matt Godbolt is British!

mkornaukhov today at 1:07 PM

Better tell me how to make the compiler not fool me!

raverbashing today at 1:37 PM

I'm curious what is the theoreme-proving magic behind add_v4 and if this is prior LLVM ir

daft_pink today at 12:47 PM

Is this an argument for compiled code?