[erlang-questions] Character encoding and the Windows command line
Simon MacMullen
simon@REDACTED
Thu Feb 9 19:37:13 CET 2012
Here is a simple test program:
-module(test).
-export([test/0]).
test() -> io:format("Args were ~w~n", [init:get_plain_arguments()]).
If I invoke this on Ubuntu, with the command line:
$ erl -run test test -- 新
I get:
Args were [[230,150,176]]
which is the UTF-8 encoding of the character I gave (\u65b0, "new,
recent, fresh, modern"). Great!
Then I try it on my Chinese language copy of Windows XP:
C:\>"c:\Program Files\erl5.9\bin\erl" -run test test -- 新
Args were [[208,194]]
Huh. That's (it turns out) encoded in GB18030. Which would appear to be
the encoding used by Windows. Better yet, if I try the same thing on a
UK copy of Windows it tries to use an encoding which can't handle
Chinese characters.
So this sucks. I would like my application internally to use all UTF-8,
all the time. It would appear that I can use the Windows command "chcp"
to detect which encoding Windows will impose on me, then try to use
erlang-iconv or something to sort myself out but:
a) This seems hideous; does anyone have a better way? (Sorry, "don't
support Windows" is not a better way).
b) It gets worse: suppose my application is installed to a path which
includes non-ASCII characters, and we use "-pa" to tell erl.exe where
that is...
In that case we *still* pass through the path in GB18030 or whatever,
and erl.exe fails to even find it as it expects UTF-8. Is there any
solution to this *at all*?
Cheers, Simon
--
Simon MacMullen
RabbitMQ, VMware
More information about the erlang-questions
mailing list