[erlang-questions] Character encoding and the Windows command line

Thu Feb 9 19:37:13 CET 2012

Here is a simple test program:

   -module(test).
   -export([test/0]).
   test() -> io:format("Args were ~w~n", [init:get_plain_arguments()]).

If I invoke this on Ubuntu, with the command line:

   $ erl -run test test -- 新

I get:
   Args were [[230,150,176]]

which is the UTF-8 encoding of the character I gave (\u65b0, "new, 
recent, fresh, modern"). Great!

Then I try it on my Chinese language copy of Windows XP:

   C:\>"c:\Program Files\erl5.9\bin\erl" -run test test -- 新
   Args were [[208,194]]

Huh. That's (it turns out) encoded in GB18030. Which would appear to be 
the encoding used by Windows. Better yet, if I try the same thing on a 
UK copy of Windows it tries to use an encoding which can't handle 
Chinese characters.

So this sucks. I would like my application internally to use all UTF-8, 
all the time. It would appear that I can use the Windows command "chcp" 
to detect which encoding Windows will impose on me, then try to use 
erlang-iconv or something to sort myself out but:

a) This seems hideous; does anyone have a better way? (Sorry, "don't 
support Windows" is not a better way).

b) It gets worse: suppose my application is installed to a path which 
includes non-ASCII characters, and we use "-pa" to tell erl.exe where 
that is...

In that case we *still* pass through the path in GB18030 or whatever, 
and erl.exe fails to even find it as it expects UTF-8. Is there any 
solution to this *at all*?

Cheers, Simon

-- 
Simon MacMullen
RabbitMQ, VMware