Small touches that inspire

It's the littlest of things that can really brighten my mood when I notice them. In this case I was watching Fallout: New Vegas' DLC trailer for Honest Hearts. In the trailer you see the player with a pistol and on one side of the pistol at least is written:

καὶ ἡ σκοτία αὐτὸ οὐ κατέλαβεν

This is Greek and refers to the second part of the verse of John 1:5 in the New Testament of the bible, meaning in English: "and the darkness did not comprehend it". In my opinion a great way to bring enlightenment by the bullet.

Sublime Text with 80 and 120 column rulers

For many programming languages we still like to use either 80 or 120 columns in our editors to ensure it fits easily on print, as well as to use it as an aid for ensuring concise code.

In Sublime Text you can set vertical rulers for this by going to Preferences » User File Preferences and add rulers 80 120 and save the file.

For Sublime Text 2 it's under Preferences » Settings — User, but the configuration file is now in JSON format, so you need to add "rulers": [80, 120] and maybe you need to append a comma at the end if you have more configuration directives following it.

Addition 2013: in Sublime Text 3 it is still under Preferences » Settings — User and the file is still in JSON, so simply add "rulers": [80, 120], like in the example for Sublime Text 2.

Mercurial 1.7, cacerts, and FreeBSD

So with recent Mercurial 1.7 releases HTTPS support was tightened, so you are bound to encounter a warning in the form of: warning: bitbucket.org certificate not verified (check web.cacerts config setting).

Now, http://mercurial.selenic.com/wiki/CACertificates there are details on what to configure for certain operating systems. Given I use FreeBSD, I altered my $HOME/.hgrc as follows:

[web]
cacerts = /etc/ssl/cert.pem

For OpenBSD this should be in the same place since release 3.8. But apparently NetBSD does not have such a file in base.

PyCharm and external lint tools

PyCharm already has a number of features present in various tools to lint/check your source code with, but offers a way to hook up external tools. Under File > Settings is a section called IDE Settings. One of the headings here is called External Tools. Select this heading and then press the Add... button on the right hand pane to configure a new external tool.

In the Edit Tool window that now appeared fill in a name, e.g. PEP8 and a group name Lint and add a description. Next point the Program to the location of the pep8.exe executable, e.g. C:Python27Scriptspep8.exe. For Parameters you need to use $FilePath and Working directory should be filled in by default. Once done, you can close it by pressing the OK button.

Now, pyflakes has no .exe or .bat file to accompany it. You will need to add a pyflakes.bat in your Scripts directory inside Python with the following contents:

@echo off
rem Use python to execute the python script having the same name as this batch
rem file, but without any extension, located in the same directory as this
rem batch file
python "%~dpn0" %*

Within PyCharm you follow largely the same settings as for pep8, however make sure to point to the batch file of pyflakes under Program. Close the external tools configuration windows by clicking OK twice. Under the menu heading Tools you should see an submenu heading Lint which, in turn, should contain two menu items: PEP8 and Pyflakes.

Now open a Python file, go to Tools > Lint > PEP8 and you should get output like the following in your Run (4) window:

D:\Python26\Scripts\pep8.exe D:\pprojects\babel\babel\tests\__init__.py
D:\pprojects\babel\babel\tests\__init__.py:16:1: E302 expected 2 blank lines, found 1

Process finished with exit code 1

On the topic of sensible date and temperature defaults in applications and websites

Something that can always get me a bit frustrated is the choice of defaults used in applications.

Dates: Aside from Belize, Canada, the Federated States of Micronesia, Palau, the Philippines, and the United States are the only countries using a date format where the month is the first entry, followed by day, and lastly year (mm/dd/yyyy). To put to numbers that's about 436 million people who use this versus 6.35 billion that don't (ratio of about 14:1). Of that 6.35 billion about 3.8 billion use a date format where day is first, followed by month, and lastly year (dd/mm/yyyy — ratio of about 9:1 to the month first users). About 1.81 billion use a form where the year is first, followed by month, and lastly day (yyyy/mm/dd, roughly equivalent to ISO 8601 — ratio of about 4:1 to the month first users). (Note: these 1.81 billion have a slight overlap with the 3.8 billion due to some countries having two date formatting forms in use or due to two or more distinct scripts with different date formatting styles.) So using a format where the month is first is only confusing for the majority of the world's population. If you need a default date, use the ISO 8601 format — not only is it less ambiguous, it also allows for much better chronological sorting.

Temperature: Aside from Belize and the United States (I so far managed to find), the worldwide standard for temperature is Celcius, not Fahrenheit. If you are using Fahrenheit you are putting 6.48 billion people at a disadvantage solely against something like 313 million people. That's a ratio of about 22:1, meaning you put 22 people at a disadvantage for every one person you are trying to please.

Disclaimer: do note that this of course only makes sense if you are appealing to an international audience. If you are just targeting a specific country you will of course default to what they use. On the other hand, properly fixing your code to be i18n-ready is the way to go anyway.

Predefined macros

So with the GNU compiler you can use the preprocessor to get a list of the predefined macros:

$ cpp -dM /dev/null

or if you prefer to invoke the preprocessor via gcc itself:

$ gcc -dM -E - < /dev/null

This should give you a list similar like:

#define __DBL_MIN_EXP__ (-1021)
#define __FLT_MIN__ 1.17549435e-38F
#define __DEC64_DEN__ 0.000000000000001E-383DD
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 2147483647

For Microsoft's Visual C++ compiler I have only found pages like:

For Intel's C++ compiler I found the following page with predefined macros.

And I find this interesting page with a lot of different compilers and their predefined macros to identify them and their versions, if any.

Edit: I also found how to do this with Clang:

$ clang -dD -E - < /dev/null

On design

Maybe this will reach some people and cause less frustration for other people:

  • No, Microsoft Word is <strong>not</strong> the correct kind of program to design your logo in.
  • When your designer asks for a high resolution copy of your logo, he means something that not 150 x 300 pixels, but rather a logo professionally designed with a vector drawing program, say, Adobe Illustrator.
  • Despite how creative your designer is, he or she need input and ideas about what you want to accomplish in order to give you results in return.
  • Designing a logo that you feel comfortable with can take as little as 1 hour or as long as a few days or even weeks (depending on the amount of people who have to affirm it), you have to pay for such effort, obviously.
  • The primary colours are blue, green, and red (unless we are talking about print, then they are cyan, magenta, and yellow).
  • No, blue is not a warm colour. Subsequently, red is not a cool colour.
  • After you have approved all designs, changing your mind means you will incur additional costs.
  • You cannot just take photos or other images/designs from the Internet and reuse them without clearing proper copyright issues.

Clustering and relevant algorithms

Disclaimer: I'm mainland European, we tend to use the , to separate digits from the whole numbers.

Clustering is quite a common approach to aggregate coordinates that are relatively close together. The problem lies in the choice of algorithm to use. This choice is highly dependent on the space in which the coordinates are laid out. Quite often you can just use basic Euclidean distance which, for a 2-dimensional space, simply takes the square root of the sum of the squared subtraction of the respective coordinates of each point. So if you have a point p with coordinates (33, 52) and a point q with coordinates (82, 19), the distance between p and q would be:

>>> import math
>>> math.sqrt(pow(33 - 82, 2) + pow(52 - 19, 2))
59.076221950967721

And based on that distance you can start to cluster points together that are all roughly the same distance from a certain point, say 59,1. The fun part of this is that this distance is the radius of a circle. So if you would plot every possible coordinate at that distance you will see a circle emerge.

In looking at clustering algorithms I also encountered something called Manhattan distance, but this algorithm only makes sense if you are working in a grid with roughly equidistant lengths to the other coordinates in this space. Normally the shortest distance from A to B would be a straight line, as the Euclidean distance shows. However, if the movement from coordinate to coordinate is restricted to straight lines, say the grid layout of a lot of North American cities, then Euclidean distance cannot apply. This is the same problem a taxi faces when trying to find the shortest distance to drive from A to B and as such the algorithm is also known as the taxicab distance or geometry. It takes the sum of the absolute value of the subtraction of the respective coordinates of each point. So if you take point p and q again, the distance would in this case be:

>>> abs(33 - 82) + abs(52 - 19)
82

Now, if you would plot all possible coordinates with that distance you will see a circle emerge again. However, keep in mind that a circle is nothing more than a set of points with a fixed distance (the radius). In this case our geometry uses a differently defined distance. If you would plot this out with a finer and finer grid the circle shape that emerges is a square rotated 45° so that it rests on its point.