I use mutt on my FreeBSD system to read my mail. To read HTML mail I simply use a .mailcap file with an entry such as
text/html; w3m -dump %s; nametemplate=%s.html; copiousoutput
This in effect dumps the HTML using w3m to a text file in order to safely display it. The problem that I had is that, because some emails that I receive are from a Japanese translators list, they are in Shift_JIS. When dumped w3m doesn’t properly detect the Shift_JIS encoding and as such the resulting output becomes garbled.
When I looked at the attachments in the mail with mutt’s ‘v’ command I saw that mutt at least knows the encoding of the attachment, so I figured that there should be a way of using this information with my mailcap. Turns out that there is indeed a way to do so, namely the charset variable. It turns out the mailcap format is a full RFC. RFC 1524 to be exact. Mutt furthermore uses the Content-Type headers to pull any specific settings into mailcap variables. So a Content-Type: text/html; charset=shift_jis means that %{charset} in the mailcap file will be expanded to shift_jis. We can use this with w3m’s -I flag to set a proper encoding prior to dumping.
text/html; w3m -I %{charset} -dump %s; nametemplate=%s.html; copiousoutput
As such you can be relatively sure that the dumped text will be in the appropriate encoding. Of course it depends on a properly set Content-Type header, but if you cannot depend on that one you need to dig out the recovery tools already.
Thanks for the nice post! It solved the same problem for me too! I use lynx:
text/html; lynx -display-charset=UTF-8 -dump %s -assume_charset=%{charset}; nametemplate=%s.html; copiousoutput
Bye
AD
This is for elinks with iconv. Terminal charset is set to Windows-1251
text/html; iconv -f %{charset} -t cp1251 < %s | elinks -dump ; nametemplate=%s.html; copiousoutput
Thank you SO MUCH!