Norihiro Tanaka wrote: > The test case "k" is 50% > faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by > two changes. Thanks, I finally got around to looking at this and got similar performance results to yours. That __attribute__((noinline)) bothers me, though, as it's not portable and is a bit inelegant. I figured out a different way to avoid the inlining, and tweaked the commentary a bit, and so installed the attached additional patch after installing your patches.