Phoronix conclusions distort their results, shown with the example of GCC vs. LLVM/Clang On AMD's FX-8350 Vishera

Phoronix recently did a benchmark of GCC vs. LLVM on AMD hardware. Sadly their conclusion did not fit the data they showed. Actually it misrepresented the data so strongly, that I decided to speak up here instead of having my comments disappear in their forums. This post was started on 2013-05-14 and got updates when things changed - first for the better, then for the worse.

Update 3 (the last straw, 2013-11-09): In the recent most blatant attack by Phoronix on copyleft programs - this time openly targeted at GNU - Michael Larabel directly misrepresented a post from Josh Klint to badmouth GDB (Josh confirmed this1). Josh gave a report of his initial experience with GDB in a Kickstarter Update in which he reported some shortcomings he saw in GDB (of which the major gripe is easily resolved with better documentation2) and concluded with “the limitations of GDB are annoying, but I can deal with it. It's very nice to be able to run and debug our editor on Linux”. Michael Larabel only quoted the conclusion up to “annoying” and abused that to support the claim that game developers (in general) call GDB “crap” and for further badmouthing of GDB. With this he provided the straw which I needed to stop reading Phoronix: Michael Larabel is hostile to copyleft and in particular to GNU and he goes as far as rigging test results3 and misrepresenting words of others to further his agenda. I even donated to Phoronix a few times in the past. I guess I won’t do that again, either. I should have learned from the error of the german pirates and should have avoided reading media which is controlled by people who want to destroy what I fight for (sustainable free software).
Update 2 (2013-07-06): But the next went down the drain again… “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.” — I couldn’t find a better way to say that those tests are completely useless while at the same time devaluing OpenMP support as “ignore this result along with all others where GCC wins”…
Update (2013-06-21): The recent report of GCC 4.8 vs. LLVM 3.3 looks much better. Not perfect, but much better.

Taking out the OpenMP benchmarks (where GCC naturally won, because LLVM only processes those tests single-threaded) and the build times (which are irrelevant to the speed of the produced binaries), their benchmark had the following result:

LLVM is slower than GCC by:

  • 10.2% (HMMer)
  • 12.7% (MAFFT)
  • 6.8% (BLAKE2)
  • 9.1% (HIMENO)
  • 42.2% (C-Ray)

With these results (which were clearly visible on their result summary on OpenBenchmarking, Michael Larabel from Phoronix concluded:

» The performance of LLVM/Clang 3.3 for most tests is at least comparable to GCC «

Nobu from their Forums supplied a conclusion which represents the data much better:

» GCC is much faster in anything which uses OpenMP, and moderately faster or equal in anything (except compile times) which doesn't [use OpenMP] «

But Michael from Phoronix did not stop at just ignoring the performance difference between GCC and LLVM. He went on claiming, that

In a few benchmarks LLVM/Clang is faster, particularly when it comes to build times.

And this is blatant reality-distortion which I am very tempted to ascribe to favoritism. LLVM is not “particularly” faster when it comes to build times.

LLVM on AMD FX-8350 Vishera is faster ONLY when it comes to build times!

This was not the first time that I read data-distorting conclusions on Phoronix - and my complaints about that in their forum did not change their actions. So I hope that my post here can help making them aware that deliberately distorting test results is unacceptable.

For my work, compiler performance is actually quite important, because I use programs which run for days or weeks, so 10% runtime reduction can mean saving several days - not counting the cost of using up cluster time.

To fix their blunders, what they would have to do is:

  • Avoiding Benchmarks which only one compiler supports properly (OpenMP).
  • Marking the compile time tests explicitely, so they strongly stand out from the rest, because they measure a completely different parameter than the other tests: Compiler Runtime vs. Performance of the Compiled Binaries.
  • Writing conclusions which actually fit their results.

Their current approach gives a distinct disadvantage to GCC (even for the OpenMP tests, because they convey the notion that if LLVM only had OpenMP, it would be better in everything - which as this test shows is simply false), so the compiler-tests from Phoronix work as covert propaganda against GCC, even in tests where GCC flat-out wins. And I already don’t like open propaganda, but when the propaganda gets masked as objective testing, I actually get angry.

I hope my post here can help move them towards doing proper testing again.

PS: I write so strongly here, because I actually like the tests from Phoronix a lot. I think we need rather more than less testing and their testsuite actually seems to do a good job - when given the right parameters - so seeing Phoronix distorting the tests to a point where they become almost useless (except as political tool against GCC) is a huge disappointment to me.

  1. Josh Klint from Leadwerks confirmed that Phoronix misrepresented his post and wrote a followup-post: » @ArneBab That really wasn't meant to be controversial. I was hoping to provide constructive feedback from the view of an Xcode / VS user.« » Slightly surprised my complaints about GDB are a hot topic. I can make just as many criticisms of other compilers and IDEs.« » The first 24 hours are the best for usability feedback. I figure if they notice a pattern some of those things will be improved.« » GDB Follwup «@Leadwerks, 2:04 AM - 11 Nov 13, 2:10 AM - 11 Nov 13 and @JoshKlint, 2:07 AM - 11 Nov 13, 8:48 PM - 11 Nov 13

  2. The first-impression criticism from Josh Klint was addressed by a Phoronix reader by pointing to the frame command. I do not blame Josh for not knowing all tricks: He wrote a fair account of his initial experience with GDB (and he said later that he wrote the post after less than 24 hours of using GDB, because he considers that the best time to provide feedback) and his experience can serve as constructive criticism to improve tutorials, documentation and the UI of GDB. Sadly his visibility and the possible impact of his work on free software made it possible for Phoronix to abuse a personal report as support for a general badmouthing of the tool. In contrast the full message of Josh Klint ended really positive: Although some annoyances and limitations have been discovered, overall I have found Linux to be a completely viable platform for application development. — Josh Klint, Leadwerks 

  3. I know that rigging of tests is a strong claim. The actions of Michael Larabel deserve being called rigging for three main reasons: (1) Including compile-time data along with runtime performance without clear distinction between both, even though compile-time of the full code is mostly irrelevant when you use a proper build system and compile time and runtime are completely different classes of results, (2) including pointless tests between incomparable setups whose only use is to relativate any weakness of his favorite system and (3) blatantly lying in the summaries (as I show in this article). 


Wählen Sie hier Ihre bevorzugte Anzeigeart für Kommentare und klicken Sie auf „Einstellungen speichern“ um die Änderungen zu übernehmen.

It seems as if Michael

It seems as if Michael copied the conclusion from the other llvm vs gcc benchmark executed on Intel hardware. On (this) AMD hardware, however, this conclusion is obviously not true.

That’s possible, but it

That’s possible, but it does not explain why other tests are being skewed in favor of LLVM, too. I might interpret too much into it and see intention where there’s just carelessness (and I did not suspect much till the article I linked here). But now looking back, I think that I recognize a strategical bias against GCC. There might be a multitide of reasons for that - from benign enthusiasm for the cool new stuff to a dislike of copyleft - and I do not want to judge where it really comes from. But it leaves a bad taste in my mouth and devalues the otherwise great tests.

If I suspect that the tests are abused (intentionally or not) to further a political agenda which wants to abolish any barrier to proprietarization of free software (one of Apples goals), then I cannot recommend the articles to others just like I would not recommend the articles in the Bild-Zeitung (a big newspaper in germany which is known to distort realities, spread blatant lies , create FUD and attack unconventional people (like people wearing black…) to further its anti-progressive and neoliberal political agenda). And I am not so arrogant to think that repeated exposure to propaganda won’t influence me…

Inhalt abgleichen
Willkommen im Weltenwald!

Beliebte Inhalte news