April 12, 2008 at 19:10
· Filed under Programming, Python
So after yesterday’s post about some compiler results with Python 2.6 I wanted to show how some of GCC’s architecture-specific compiler flags affect the execution of pybench. As I explained in comments I think most people will never even touch the flags passed to Python’s build. Nonetheless, some people asked if I had tuned it in any way. Pádraig Brady had asked me if I had used the optimal GCC architecture flags. On my FreeBSD 7.0-STABLE machine at home (AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (2411.13-MHz K8-class CPU)) his script stated I had to pass along “-m32 -march=k8 -mfpmath=sse”. My machine is fully 64 bits so I left out the -m32 (since it will not link anyway) and used “-march=k8 -mfpmath=sse” (using -march=native instead of k8 resulted in a 0,1 seconds faster result and -mtune=native -march=native instead of k8 resulted in a 0,1 - 0,2 seconds faster result).
The default option flags are on my system: -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes.
Considering some other comments about how I did not use a 0-origin for my y-axis I have to point out two things: firstly, given the sometimes close results zooming out too much can eliminate detailed information (of course you have to be careful not to zoom in too much as well); secondly, I like to make sure the graph itself is appropriately centered so you do not get a whitespace skewing in the resulting image. I think, being a follower of the Edward Tufte school of graphic displaying, I did reasonably well. The graphs were made with a tool called Ploticus.

I was curious how the optimization level influenced the resulting program and as such I removed the -O3 option from the compiler flags. As is evident from the graph you are looking at a bit more than a doubling of execution time (an average of 14,2 seconds versus the previous 6,6 and 6,5 seconds).

So, given the huge performance hit by merely leaving out the -O3, I was interested how the other optimization levels worked out. Holger Hoffstätte asked to use -O2 -fomit-frame-pointer instead of -O3. Basically the results of -O3 (average of 6,5 seconds) and -O2 -fomit-frame-pointer (average of 6,5 seconds) were equal. The result of using -O1 (I could not really discern much of a speed difference by adding -fomit-frame-pointer, also for the -O2 case it was still an average of 6,5 seconds) was quite interesting. It already improves execution by ~86%. From -O1 to -O2/-O3 we are looking at another increase of ~16%. From the no optimization case to -O2/-O3 execution improves by ~118%

I tried a profile-guided optimization build, but I have some issues on my FreeBSD 7.0-STABLE with libgcov. Apparently only a libgconv.a is provided and linking gives me a relocation warning. Thankfully I also had a GCC 4.2.4 snapshot from March installed and did a PGO build, but I managed to only shave of about 0,2 seconds on the average time.
Tags:
benchmark,
compiler,
edward tufte,
gcc,
python 2.6
Permalink
April 11, 2008 at 16:43
· Filed under Programming, Python
Due to recent concerns with memory use and execution speed I was curious how Python would behave with different compilers. I took Python 2.6a2 r62288 from the Subversion repository and compiled it with the flags: –with-threads –enable-unicode=ucs4 –enable-ipv6. The machine is a HP dc7700p with 1GB memory with an Intel Core2 6300 @ 1.86GHz running Ubuntu 7.10. I installed GCC 3.3.6, 3.4.6, 4.1.3, 4.2.1 from the Gutsy repository, and Intel 10.1.015. The MS Visual Studio 2008 Python was the MSI snapshot of 2008-04-10 from the main Python site. I ran this through Wine 0.9.46 after installing the VC2008 runtime.
First various GCC versions: 3.3.6, 3.4.6, 4.1.3, 4.2.1:

It is good to see that the 3.4 series is faster than the 3.3 series and the 4.2 series is faster than the 4.1 series. I am a bit worried about the 4.1 series drop in performance compared to the 3 series though.
Next we have Python compiled with GCC 3.4.6, 4.2.1, Intel CC 10.1.015, MSC from Visual Studio 2008:

It is nice to see how the Microsoft Visual Studio 2008 compiler produces a binary that, when run through Wine, still performs quite well compared to GCC. I am not quite sure if Wine incurs a performance penalty or not. What’s quite impressive is the performance of the Intel CC compiled Python. If we take the fastest GCC, which is 4.2.1 at the moment, take the average of the 10 rounds of execution, which is 6,574 seconds, and compare that to the average of ICC, which is 5,412 seconds, we see that ICC is about 21% faster. If we take the slowest, GCC 4.1.3 with an average of 7,002 seconds, we even get a result that ICC is about 29% faster.
So it seems for people who want to get the full performance out of Python compiling with ICC might be quite beneficial. I want to check out how ICC progressed from version 8 to version 10 performance-wise.
The raw data can be found at http://www.in-nomine.org/~asmodai/python-pybench.txt.
Tags:
benchmark,
compiler,
gcc,
icc,
python 2.6,
visual studio
Permalink
February 6, 2008 at 22:56
· Filed under Programming, Python
For the past few months there’s been a certain vibe building up. This vibe is coming from parts of the Python community. As it stands 2008 seems to become a very stellar year for Python.
Just after New Year TIOBE reported this:
Python has been declared as programming language of 2007. It was a close finish, but in the end Python appeared to have the largest increase in ratings in one year time (2.04%). There is no clear reason why Python made this huge jump in 2007. Last month Python surpassed Perl for the first time in history, which is an indication that Python has become the “de facto” glue language at system level. It is especially beloved by system administrators and build managers. Chances are high that Python’s star will rise further in 2008, thanks to the upcoming release of Python 3.
There are a lot of really interesting developments going on. Some interesting developments in my opinion are (in no particular order): Babel, Bitten, Genshi, Trac, Werkzeug, WebOb.
An exciting year indeed.
Tags:
babel,
bitten,
genshi,
trac,
webob,
werkzeug
Permalink
December 21, 2007 at 13:48
· Filed under Programming, Python
Armin Ronacher has released Werkzeug 0.1 a little while ago. As the website for Werkzeug says: “Werkzeug is a collection of various utilities for WSGI applications. It features request and response objects as well as a powerful url dispatcher and a debugging system.”
I have it on my todo list to convert my current CherryPy environment to Werkzeug as a proof of concept and see which one of the two I prefer and will ultimately use for my Japanese-Dutch dictionary project.
Tags:
cherrypy,
werkzeug,
wsgi
Permalink
July 12, 2004 at 08:42
· Filed under Music, Python, Thoughts
Some of these days just start whacked.
Manager asks me to overlook a CV of the guy who’s going to replace me. Sure, no problem. So I find the guy alright and now I have to help in interviewing him? Uhhh, this is funny. Not that usual in the Netherlands to do this…
Bored out of my mind, thankfully I can put the time to good use. Almost up to date on all episodes of Uzumaki Naruto.
Right now converting Amos’ script from elisp to Python. Going nicely thus far.
The weather makes me want to put on some Muse or likewise bands… Although I was listening to UK Punjabi remixes in the car on my way to work…
Tags:
anime,
muse,
naruto,
tendra,
work
Permalink