[erlang-questions] escript argument encoding

Thu Mar 20 19:38:50 CET 2014

I’m running into an issue where OSX and Linux seem to treat the command-line arguments to an escript differently. Using the following escript:

#!/usr/bin/env escript
%% -*- erlang -*-
%%! +pc unicode

main([Args]) ->
    io:setopts([{encoding, utf8}]),
    io:format("~w~n", [Args]),
    io:format("~ts~n", [Args]).

Both my OSX and Linux (Ubuntu 13.10) boxes have their LANG set to en_US.UTF-8. I’m running the escript like so:

./sample.escript سلام

On OSX, the escript seems to treat `Args` as a list of unicode code-points:

./sample.escript سلام
[1587,1604,1575,1605]
سلام

On Linux, it seems to treat the input as a list of UTF-8 bytes, where each byte is turned into an integer. The Erlang unicode guide calls this a 'Lists of UTF-8 Bytes' [1].

./sample.escript سلام
[216,179,217,132,216,167,217,133]
Ø³Ù„Ø§Ù

How does I get both OSX and Linux to treat the input of the escript as a list of code-points?

Thanks,
Reid

[1] http://www.erlang.org/doc/apps/stdlib/unicode_usage.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140320/dbfc2651/attachment.htm>