Small touches that inspire

It's the littlest of things that can really brighten my mood when I notice them. In this case I was watching Fallout: New Vegas' DLC trailer for Honest Hearts. In the trailer you see the player with a pistol and on one side of the pistol at least is written:

καὶ ἡ σκοτία αὐτὸ οὐ κατέλαβεν

This is Greek and refers to the second part of the verse of John 1:5 in the New Testament of the bible, meaning in English: "and the darkness did not comprehend it". In my opinion a great way to bring enlightenment by the bullet.

Character encoding in mailcap for mutt and w3m

I use mutt on my FreeBSD system to read my mail. To read HTML mail I simply use a .mailcap file with an entry such as

text/html; w3m -dump %s; nametemplate=%s.html; copiousoutput

This in effect dumps the HTML using w3m to a text file in order to safely display it. The problem that I had is that, because some emails that I receive are from a Japanese translators list, they are in Shift_JIS. When dumped w3m doesn't properly detect the Shift_JIS encoding and as such the resulting output becomes garbled.

When I looked at the attachments in the mail with mutt's 'v' command I saw that mutt at least knows the encoding of the attachment, so I figured that there should be a way of using this information with my mailcap. Turns out that there is indeed a way to do so, namely the charset variable. It turns out the mailcap format is a full RFC. RFC 1524 to be exact. Mutt furthermore uses the Content-Type headers to pull any specific settings into mailcap variables. So a Content-Type: text/html; charset=shift_jis means that %{charset} in the mailcap file will be expanded to shift_jis. We can use this with w3m's -I flag to set a proper encoding prior to dumping.

text/html; w3m -I %{charset} -dump %s; nametemplate=%s.html; copiousoutput

As such you can be relatively sure that the dumped text will be in the appropriate encoding. Of course it depends on a properly set Content-Type header, but if you cannot depend on that one you need to dig out the recovery tools already.

Why using 'lorem ipsum' is bad for web site testing

The typesetting and webdesign industry has apparently been using the 'lorem ipsum' text for a while to provide a dummy text in order to test print and layout.

Aside from the fact that the text is a cut off section of Cicero's de finibus bonorum et malorum, it also fails in one huge aspect, namely globalisation.

The text is Latin, latin is the simplest of all characters we have available to us on the world-wide web. If your website is English only then, yes, you are quite done. However for a lot of us we also have to support languages other than English, the easiest of which are Latin-derived scripts.

Latin, and subsequently English, are both written left-to-right. Hebrew and Arabic, to take two prime examples, are written right-to-left (leaving numerals aside for the moment). Of course, this is very important to also test since it means a lot of change is needed for your lay out.

Especially when testing your design for sites that need to display multiple languages on the same page it is pertinent to test with multilingual text. One of the things that should quickly become clear is whether or not a sufficient encoding has been chosen.

The Elephant (象)

I will bear criticism like an elephant in battle bears an arrow from a bow. Most people are bad behaviour. (<span lang="ja">戦場の象が、射られた矢にあたっても堪え忍ぶように、われらはひとのそしりを忍ぼう。多くの人は実に性質(たち)が悪いからである。</span>)

One can take a trained elephant even into a crowd. The king himself will ride a trained elephant. He who is disciplined is the best of men, since he can bear criticism. (<span lang="ja">馴らされた象は、戦場にも連れて行かれ、王の乗りものとなる。世のそしりを忍び、自らをおさめた者は、人々の中にあっても最上の者である。</span>)

Trained mules are excellent, and so are thoroughbred horses from the Sindh, and so are great battle elephants, but more excellent than them all is a disciplined man. (<span lang="ja">馴らされた騾馬は良い。インダス河のほとりの血統よき馬も良い。クンジャラという名の大きな象も良い。しかし自己をととのえた人はそれらよりもすぐれている。</span>)

There is no reaching the unattainable with mounts like these, but with himself well under control a disciplined man can get there. (<span lang="ja">何となれば、これらの乗物によっては未到の地(ニルヴァーナ)に行くことはできない。そこへは、慎しみある人が、おのれ自らをよくととのえておもむく。</span>)

Dhammapalo, the elephant, is hard to control in rut. Even when tied up, he refuses his food. The great tusker is thinking of the elephant forest. (<span lang="ja">「財を守る者」という名の象は、発情期にこめかみから液汁をしたたらせて強暴になっているときは、いかんとも制し難い。捕らえられると、一口の食物も食べない。象は象の林を慕っている。</span>)

Then a man is a lie-abed and over-eats, a lazy person who wallows in sleep like a great over-fed hog, a fool like that will be reborn time after time. (<span lang="ja">大食いをして、眠りをこのみ、ころげまわって寝て、まどろんでいる愚鈍な人は、大きな豚のように糧を食べて肥り、くりかえし母胎に入って(迷いの生存をつづける)。</span>)

My mind used formerly to go off wandering wherever it felt like, following its own inclination, but today I shall control it carefully, like a mahout does a rutting elephant. (<span lang="ja">この心は、以前には、望むがままに、欲するがままに、快きがままに、さすらっていた。今やわたくしはその心をすっかり抑制しよう、___象使いが鉤をもって、発情期に狂う象を全くおさえつけるように。</span>)

Take pleasure in being careful. Guard your mind well. Extricate yourself from the mire, like a great tusker sunk in the mud. (<span lang="ja">つとめはげむのを楽しめ。おのれの心を護れ。自己を難処から救い出せ。___泥沼に落ち込んだ象のように。</span>)

If you find an intelligent companion, a wise and well-behaved person going the same way as yourself, then go along with him, overcoming all dangers, pleased at heart and mindful. (<span lang="ja">もしも思慮深く聡明でまじめな生活をしている人を伴侶として共に歩むことができるならば、あらゆる危険困難に打ち克って、こころ喜び、念いをおちつけて、ともに歩め。</span>)

But if you do not find an intelligent companion, a wise and well-behaved person going the same way as yourself, then go on your way alone, like a king abandoning a conquered kingdom, or like a great elephant in the deep forest. (<span lang="ja">しかし、もしも思慮深く聡明でまじめな生活をしている人を伴侶として共に歩むことができないならば、国を捨てた国王のように、また林の中の象のように、ひとり歩め。</span>)

It is better to travel alone. There is no companionship with a fool. Go on your way alone and commit no evil, without cares like a great elephant in the deep forest. (<span lang="ja">愚かな者を道伴れとするな。独りで行くほうがよい。孤独(ひとり)で歩め。悪いことをするな。求めるところは少なくあれ。___林の中にいる象のように。</span>)

It is good to have companions when occasion arises, and it is good to be contented with whatever comes. Merit is good at the close of life, and the elimination of all suffering is good. (<span lang="ja">事がおこったときに、友だちのあるのは楽しい。(大きかろうとも、小さかろうとも)、どんなことにでも満足するのは楽しい。善いことをしておけば、命の終るときに楽しい。(悪いことをしなかったので)、あらゆる苦しみ(の報い)を除くことは楽しい。</span>)

Good is filial devotion to one's mother in the world, and devotion to one's father is good. It is good to be a sanyasi in the world and to be a brahmin too. (<span lang="ja">世に母を敬うことは楽しい。また父を敬うことは楽しい。世に修行者を敬うことは楽しい。世にバラモンを敬うことは楽しい。</span>)

Good is good behaviour up to old age, good is firmly established faith, good is the acquisition of understanding, and abstention from evil is good. (<span lang="ja">老いた日に至るまで戒しめをたもつことは楽しい。信仰が確立していることは楽しい。明らかな知慧を体得することは楽しい。もろもろの悪事をなさないことは楽しい。</span>)

English translation by John Richards. Japanese translation by <span lang="ja">中村元</span> (NAKAMURA Hajime)

Office 2003, Visual Basic editor and AppLocale

So I was working with a Japanese .xla (Excel add-in) file. I needed to look at something in the source so I fired up the Visual Basic editor within Excel. Upon investigating the form and the various captions it turns out that the Visual Basic editor only displayed them in gibberish (typical decoding issues) or question marks (substituting the .notdef glyph for codepoints). So it seems the Visual Basic editor is either not multi-byte capable (typing directly a string in Japanese into the caption yielded question marks) or it is bound to the locale of the system.

I then remembered AppLocale and fired up Excel through it, setting it to think it is on a Japanese system. Then within Excel I proceeded to start the Visual Basic editor and, sure enough, the text was showing me the Japanese I needed.

I am not sure if I should find this lame or understandable.

Wah Nam Hong (華南行) in Rotterdam

Here in Rotterdam we have a Chinese supermarket called in Dutch phonetic Cantonese 'Wah Nam Hong', which in Jyutping (waa4 naam4 hong4) stands for the hanzi <span lang="zh">華南行</span>. Literally translated <span lang="zh">華南</span> stands for South China and matches the obvious Cantonese heritage. The <span lang="zh">行</span> stands for a profession or business line.

What is interesting to me is that in Japanese (<span lang="ja">日本語</span>) you read <span lang="ja">華南</span> as <span lang="ja">かなん</span> and it means South China as well. However, <span lang="ja">行</span> would be <span lang="ja">こう</span> or <span lang="ja">ぎょう</span> and has not retained the profession/business line meaning at all.