Character encoding in mailcap for mutt and w3m

I use mutt on my FreeBSD system to read my mail. To read HTML mail I simply use a .mailcap file with an entry such as

text/html;      w3m -dump %s; nametemplate=%s.html; copiousoutput

This in effect dumps the HTML using w3m to a text file in order to safely display it. The problem that I had is that, because some emails that I receive are from a Japanese translators list, they are in Shift_JIS. When dumped w3m doesn’t properly detect the Shift_JIS encoding and as such the resulting output becomes garbled.

When I looked at the attachments in the mail with mutt’s ‘v’ command I saw that mutt at least knows the encoding of the attachment, so I figured that there should be a way of using this information with my mailcap. Turns out that there is indeed a way to do so, namely the charset variable. It turns out the mailcap format is a full RFC. RFC 1524 to be exact. Mutt furthermore uses the Content-Type headers to pull any specific settings into mailcap variables. So a Content-Type: text/html; charset=shift_jis means that %{charset} in the mailcap file will be expanded to shift_jis. We can use this with w3m’s -I flag to set a proper encoding prior to dumping.

text/html;      w3m -I %{charset} -dump %s; nametemplate=%s.html; copiousoutput

As such you can be relatively sure that the dumped text will be in the appropriate encoding. Of course it depends on a properly set Content-Type header, but if you cannot depend on that one you need to dig out the recovery tools already.

4 thoughts on “Character encoding in mailcap for mutt and w3m

  1. Thanks for the nice post! It solved the same problem for me too! I use lynx:

    text/html; lynx -display-charset=UTF-8 -dump %s -assume_charset=%{charset}; nametemplate=%s.html; copiousoutput



  2. This is for elinks with iconv. Terminal charset is set to Windows-1251

    text/html; iconv -f %{charset} -t cp1251 < %s | elinks -dump ; nametemplate=%s.html; copiousoutput

Leave a Reply

Your email address will not be published. Required fields are marked *