Python 2.6a2 execution times with various compilers

Due to recent concerns with memory use and execution speed I was curious how Python would behave with different compilers. I took Python 2.6a2 r62288 from the Subversion repository and compiled it with the flags: –with-threads –enable-unicode=ucs4 –enable-ipv6. The machine is a HP dc7700p with 1GB memory with an Intel Core2 6300 @ 1.86GHz running Ubuntu 7.10. I installed GCC 3.3.6, 3.4.6, 4.1.3, 4.2.1 from the Gutsy repository, and Intel 10.1.015. The MS Visual Studio 2008 Python was the MSI snapshot of 2008-04-10 from the main Python site. I ran this through Wine 0.9.46 after installing the VC2008 runtime.

First various GCC versions: 3.3.6, 3.4.6, 4.1.3, 4.2.1:

Python 2.6a2 compiled with GCC

It is good to see that the 3.4 series is faster than the 3.3 series and the 4.2 series is faster than the 4.1 series. I am a bit worried about the 4.1 series drop in performance compared to the 3 series though.

Next we have Python compiled with GCC 3.4.6, 4.2.1, Intel CC 10.1.015, MSC from Visual Studio 2008:

Python 2.6a2 compiled with GCC, ICC, MSC

It is nice to see how the Microsoft Visual Studio 2008 compiler produces a binary that, when run through Wine, still performs quite well compared to GCC. I am not quite sure if Wine incurs a performance penalty or not. What’s quite impressive is the performance of the Intel CC compiled Python. If we take the fastest GCC, which is 4.2.1 at the moment, take the average of the 10 rounds of execution, which is 6,574 seconds, and compare that to the average of ICC, which is 5,412 seconds, we see that ICC is about 21% faster. If we take the slowest, GCC 4.1.3 with an average of 7,002 seconds, we even get a result that ICC is about 29% faster.

So it seems for people who want to get the full performance out of Python compiling with ICC might be quite beneficial. I want to check out how ICC progressed from version 8 to version 10 performance-wise.

The raw data can be found at

9 thoughts on “Python 2.6a2 execution times with various compilers

  1. Ah yes, mea culpa.

    I stuck with the default options which configure will put in the Makefile, it seems to be: -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes and the necessary options for threading support for GCC.

    I did this since it is my assumption that most people will not twiddle those settings.

  2. If you do any further testing, it would be very interesting to check what can be gained by profile guided optimization (PGO).

  3. I ran a set of C++ benchmarks, testing PGO on GCC 4.2.2 and also comparing to Java. See the URL of this message for the results.

    The results of your benchmark seem odd and actually conflicting to the results that I have previously got (not documented) and seen elsewhere. I plan to also add ICC to my benchmarks, when I have the time. Possibly also MSVC, but probably not. It will be interesting to see if the results match yours.

    In any case, I have long been searching for a comparison between MSVC and GCC, so thank you for your effort.

  4. Paddy,

    you are quite right when it concerns the valleys and hills. I guess my mind was on Friday mode.

    I’ll see if I can update or add over the weekend.

  5. Were you compiling for i686 or amd64/x86_64?

    Also, using gcc’s PGO would be interesting to see. Though I agree, most people will never change the defaults that configure uses. But if we can change python’s configure to use better options when the correct gcc version is available we should do that.

  6. I would also suggest to not use -O3 for python but only -O2 -fomit-frame-pointer (plus a properly selected architecture). Except for numerical/fp-heavy code (mplayer, audio/video data en/decoding) which is amenable to unrolling & increased parallelism, -O3 is almost always slower, mostly due to cache blowout.

Leave a Reply

Your email address will not be published. Required fields are marked *