Thursday 9 August 2012

Wednesday 1 August 2012

Broken encoding in Japanese ZIP files.

After a long and painful road I found a way to properly list file names in the archive using 7z:

> LANG=C 7z l ファイル | iconv -c -f SJIS -t utf8

A little more googling afterwards revealed a blog post with the complete and satisfactory solution, which I modify for more automation (in order to not mess up the other files, the archive should be extracted in an empty directory):

> LANG=C 7z x ファイル ; find -exec convmv --notest -f shift-jis -t utf8 {} \;

Finally, there exists a patch called unzip-iconv which is supposed to add the option that specifies what encoding the archive uses. Too bad it's not in Squeeze.