Norihiro Tanaka wrote: > Could you try above cases? Thanks, you're observing a 2.7x performance speedup with macros on your platform and your benchmark. With the same patch, I observed only a 1.18x speedup on the same benchmark. As usual, I'm testing with AMD Phenom II X4 910e + GCC 4.9.0 + Fedora 20 + default (-O2) optimization. I'm curious about why you're observing a much bigger performance difference with macros. What platform are you using? Anyway, an 18% speedup is still a speedup, so I looked into it. GCC 4.9.0 misses a non-obvious opportunity for function inlining. I installed a tweak (attached) that should make the inlining opportunity obvious to compilers nowadays. On my platform this gave a 28% speedup, i.e., a bit better than the macro-using patch would have.