# Predefined macros

So with the GNU compiler you can use the preprocessor to get a list of the predefined macros:

[code lang="bash"]
$cpp -dM /dev/null [/code] or if you prefer to invoke the preprocessor via gcc itself: [code lang="bash"]$ gcc -dM -E - < /dev/null
[/code]

This should give you a list similar like:

[code lang="c"]
#define __DBL_MIN_EXP__ (-1021)
#define __FLT_MIN__ 1.17549435e-38F
#define __DEC64_DEN__ 0.000000000000001E-383DD
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 2147483647
[/code]

For Microsoft’s Visual C++ compiler I have only found pages like:

For Intel’s C++ compiler I found the following page with predefined macros.

And I find this interesting page with a lot of different compilers and their predefined macros to identify them and their versions, if any.

Edit: I also found how to do this with Clang:

[code lang="bash"]
\$ clang -dD -E - < /dev/null
[/code]

# Python 2.6 compiler options results

So after yesterday’s post about some compiler results with Python 2.6 I wanted to show how some of GCC’s architecture-specific compiler flags affect the execution of pybench. As I explained in comments I think most people will never even touch the flags passed to Python’s build. Nonetheless, some people asked if I had tuned it in any way. Pádraig Brady had asked me if I had used the optimal GCC architecture flags. On my FreeBSD 7.0-STABLE machine at home (AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ (2411.13-MHz K8-class CPU)) his script stated I had to pass along “-m32 -march=k8 -mfpmath=sse”. My machine is fully 64 bits so I left out the -m32 (since it will not link anyway) and used “-march=k8 -mfpmath=sse” (using -march=native instead of k8 resulted in a 0,1 seconds faster result and -mtune=native -march=native instead of k8 resulted in a 0,1 – 0,2 seconds faster result).

The default option flags are on my system: -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes.

Considering some other comments about how I did not use a 0-origin for my y-axis I have to point out two things: firstly, given the sometimes close results zooming out too much can eliminate detailed information (of course you have to be careful not to zoom in too much as well); secondly, I like to make sure the graph itself is appropriately centered so you do not get a whitespace skewing in the resulting image. I think, being a follower of the Edward Tufte school of graphic displaying, I did reasonably well. The graphs were made with a tool called Ploticus.

I was curious how the optimization level influenced the resulting program and as such I removed the -O3 option from the compiler flags. As is evident from the graph you are looking at a bit more than a doubling of execution time (an average of 14,2 seconds versus the previous 6,6 and 6,5 seconds).

So, given the huge performance hit by merely leaving out the -O3, I was interested how the other optimization levels worked out. Holger Hoffstätte asked to use -O2 -fomit-frame-pointer instead of -O3. Basically the results of -O3 (average of 6,5 seconds) and -O2 -fomit-frame-pointer (average of 6,5 seconds) were equal. The result of using -O1 (I could not really discern much of a speed difference by adding -fomit-frame-pointer, also for the -O2 case it was still an average of 6,5 seconds) was quite interesting. It already improves execution by ~86%. From -O1 to -O2/-O3 we are looking at another increase of ~16%. From the no optimization case to -O2/-O3 execution improves by ~118%

I tried a profile-guided optimization build, but I have some issues on my FreeBSD 7.0-STABLE with libgcov. Apparently only a libgconv.a is provided and linking gives me a relocation warning. Thankfully I also had a GCC 4.2.4 snapshot from March installed and did a PGO build, but I managed to only shave of about 0,2 seconds on the average time.

# Python 2.6a2 execution times with various compilers

Due to recent concerns with memory use and execution speed I was curious how Python would behave with different compilers. I took Python 2.6a2 r62288 from the Subversion repository and compiled it with the flags: –with-threads –enable-unicode=ucs4 –enable-ipv6. The machine is a HP dc7700p with 1GB memory with an Intel Core2 6300 @ 1.86GHz running Ubuntu 7.10. I installed GCC 3.3.6, 3.4.6, 4.1.3, 4.2.1 from the Gutsy repository, and Intel 10.1.015. The MS Visual Studio 2008 Python was the MSI snapshot of 2008-04-10 from the main Python site. I ran this through Wine 0.9.46 after installing the VC2008 runtime.

First various GCC versions: 3.3.6, 3.4.6, 4.1.3, 4.2.1:

It is good to see that the 3.4 series is faster than the 3.3 series and the 4.2 series is faster than the 4.1 series. I am a bit worried about the 4.1 series drop in performance compared to the 3 series though.

Next we have Python compiled with GCC 3.4.6, 4.2.1, Intel CC 10.1.015, MSC from Visual Studio 2008:

It is nice to see how the Microsoft Visual Studio 2008 compiler produces a binary that, when run through Wine, still performs quite well compared to GCC. I am not quite sure if Wine incurs a performance penalty or not. What’s quite impressive is the performance of the Intel CC compiled Python. If we take the fastest GCC, which is 4.2.1 at the moment, take the average of the 10 rounds of execution, which is 6,574 seconds, and compare that to the average of ICC, which is 5,412 seconds, we see that ICC is about 21% faster. If we take the slowest, GCC 4.1.3 with an average of 7,002 seconds, we even get a result that ICC is about 29% faster.

So it seems for people who want to get the full performance out of Python compiling with ICC might be quite beneficial. I want to check out how ICC progressed from version 8 to version 10 performance-wise.

The raw data can be found at http://www.in-nomine.org/~asmodai/python-pybench.txt.