The data submission phase for CLDR 1.8 should be closed by now (although the survey tool still says it’s accepting submissions). For Dutch (nl_NL), I’ve been going over quite some items together with the Apple contributor and someone else, so expect quite some improvements on that area. The current release date is aimed at somewhere in March 2010.
Google has released an input method editor (IME) for Japanese in a similar style as their Chinese IME. It can be found on their IME page. It looks to be available for Mac OS X, Windows XP SP2, Vista SP1, and Windows 7.
It looks like the Chinese Office 2010 font list is the following (Changzhou SinoType, Founder, Microsoft, Stone):
FZShuTi
FZYaoTi
LiSu
Microsoft YaHei
Microsoft YaHei Bold
STCaiyun
STFangsong
STHupo
STKaiti
STLiti
STSong
STXihei
STXingkai
STXinwei
STZhongsong
YouYuan
From the language pack make sure to select 国际字体 (international fonts) and 校对工具 (proofing tools). Under 国际字体 we have 典型字体 (typical fonts) and under 校对工具 we have 简体中文校对工具 (Simplified Chinese proofing tools) and 英语校对工具 (English proofing tools).
Microsoft has released Office 2010 as a beta that you can use up to and including October 2010 (scheduled to be released in June 2010). You can download it as either 32 or 64 bit, although it seems the 64 bit download is a bit hidden since many buttons for downloading seem to lead to the default 32 bit download. If you follow the link at the Professional Plus site to ‘Get It Now’ you should be presented with links to both versions. At the moment Microsoft supports Chinese (Simplified), English, French, German, Japanese, Russian, and Spanish. If you are like me you just use the application in English, but then miss some of the proofing tools for, say, Japanese.
You can download language packs from the Microsoft Download Center. If you change the language to, say, Japanese you are presented with two download links at the bottom for the Japanese language pack. This language pack includes user interface changes for Japanese as well as proofing tools, OCR support, and fonts.
Once the pack is downloaded just run it and you can customize want you want to install. Since I am not interested in the UI aspects of the pack, I selected the top part and toggled selection for all to not install. Then for the entries 国際フォント (international fonts) and 文章校正ツール (proofing tools) I made sure to install everything. 文章校正ツール includes both 日本語用校正ツール and 英語用校正ツール and I guess you can most likely skip 英語用校正ツール since it is already installed. 国際フォント includes 標準フォント (standards font), which I am guessing is related to JIS X standards for font encodings.
Basic Windows 7 has 134 fonts installed. A basic English Office 2010 install increases this to 198 fonts installed. Installing the Japanese language pack proofing tools with fonts brings this to 228 fonts installed.
If you press the expansion arrow at the bottom-right of the Home part of the ribbon (or press CTRL-D) you will get the Font dialog. If you select the Advanced tab you can turn on features such as OpenType ligatures. This will mean that with text such as ‘fl’ or ‘ffi’ certain parts of the letters will connect instead of showing white space between the letters. This is the same technique used in printed media such as books.
Update: Michael Hendry was kind enough to point out that I was mistaking 標準 with (standard/default) with 基準 (standards/JIS/ISO).
A friend of mine pointed me to this article about James Cameron and his latest film “Avatar”. I am personally much inspired by such things and I hope I share at least some minor part of this kind of zeal in delivering perfectionist accomplishments. I love how he hired experts from different areas of expertise to work on the language, flora, or other parts of his fantasy world, all in all to make the world more consistent. This is the bread and butter of making an experience fully immersive. Sure, it might be wasted on the audience who just goes to watch the movie, but people like myself appreciate this. I am not sure how many experience this, but whenever I play a game where I notice that some design has been reused, watch or read something where I notice the consistency is off I feel kind of let down. I guess it is hard for me to understand why other people would not go the extra mile to avoid such problems.
Ангелы и демоны кружили надо мной
Разбивали тернии и звёздные пути
Не знает счастья только тот,
Кто его зова понять не смог…
Mana du vortis, Mana du vortis
Aeria gloris, Aeria gloris
Mana du vortis, Mana du vortis
Aeria gloris, aeria gloris
I am Calling Calling now, Spirits rise and falling Собой остаться дольше…
Calling Calling, in the depth of longing Собой остаться дольше…
Mana du vortis, Mana du vortis
Aeria gloris, Aeria gloris
Stand alone… Where was life when it had a meaning…
Stand alone… Nothing’s real anymore and…
…Бесконечный бег…
Пока жива я могу стараться на лету не упасть,
Не разучиться мечтать…любить…
…Бесконечный бег…
Calling Calling, For the place of knowing
There’s more that what can be linked
Calling Calling, Never will I look away
For what life has left for me
Yearning Yearning, for what’s left of loving Собой остаться дольше…
Calling Calling now, Spirits rise and falling… Собой остаться дольше…
Calling Calling, in the depth of longing… Собой остаться дольше…
Mana du vortis, Mana du vortis
Aeria gloris, Aeria gloris
Mana du vortis, Mana du vortis
Aeria gloris, aeria gloris
This in effect dumps the HTML using w3m to a text file in order to safely display it. The problem that I had is that, because some emails that I receive are from a Japanese translators list, they are in Shift_JIS. When dumped w3m doesn’t properly detect the Shift_JIS encoding and as such the resulting output becomes garbled.
When I looked at the attachments in the mail with mutt’s ‘v’ command I saw that mutt at least knows the encoding of the attachment, so I figured that there should be a way of using this information with my mailcap. Turns out that there is indeed a way to do so, namely the charset variable. It turns out the mailcap format is a full RFC. RFC 1524 to be exact. Mutt furthermore uses the Content-Type headers to pull any specific settings into mailcap variables. So a Content-Type: text/html; charset=shift_jis means that %{charset} in the mailcap file will be expanded to shift_jis. We can use this with w3m’s -I flag to set a proper encoding prior to dumping.
As such you can be relatively sure that the dumped text will be in the appropriate encoding. Of course it depends on a properly set Content-Type header, but if you cannot depend on that one you need to dig out the recovery tools already.
New Zealand starts its Haka ka mate but then get interrupted by Tonga with their Sipi tau. Every time I watch this my skin gets goosebumps. Quite powerful when you see them perform against each other like this.
So I was updating my input method editors (IME) from the default in Windows x64 (IME 2002) to the ones provided by Office 2007’s language packs. As explained in a previous post of mine you can install the proofing tools and input by passing LAUNCHEDBYSETUPEXE=1 to the execution of the MSI. Now, on my Windows x64 I installed the IME by installing the IME64.MSI with this added variable. The weird thing was that some applications worked flawlessly and yet others showed me the wrong number of icons or no icons at all! It turns out that these applications are 32-bits applications and need to have the 32-bits IME installed as well. So next to installing IME64.MSI of the language you want to install, you will also have to install IME32.MSI. Only after doing this will you notice the applications working as you want them.
Thinking back on it, it makes perfect sense, but while you are in the middle of working with it you keep wondering: “why?”