[erlang-questions] UTF-8 and escript problem

Mon Mar 10 05:24:54 CET 2014

I ran into a problem with io:format() when invoked remotely, and with Scott Fritchie’s assistance I’ve narrowed it down (I think) to escript.

Here’s my script:

#!/usr/bin/env escript
%%
%%! -sname foo

main([]) ->
   Other = 'bar@REDACTED',
   Farsi = [1601,1575,1585,1587,1740],
   Farsi2 = "فارسی",
   spawn(Other, fun() -> io:format("Bytes: ~ts~n", [Farsi]) end),
   spawn(Other, fun() -> io:format("Bytes: ~w~n", [Farsi]) end),
   spawn(Other, fun() -> io:format("Paste: ~ts~n", [Farsi2]) end),
   spawn(Other, fun() -> io:format("Paste: ~w~n", [Farsi2]) end),
   timer:sleep(1000).

When attempting this directly from a different node’s console (spawned to run on ‘bar’), everything displays as expected:

Bytes: فارسی
Bytes: [1601,1575,1585,1587,1740]
Paste: فارسی
Paste: [1601,1575,1585,1587,1740]

When invoked via the above escript, I consistently get the wrong output:

Bytes: \x{641}\x{627}\x{631}\x{633}\x{6CC}
Bytes: [1601,1575,1585,1587,1740]
Paste: فارسی
Paste: [217,129,216,167,216,177,216,179,219,140]

This is repeatable on R15B01 and R16B03-1, both under OS X Mavericks installed via kerl. My LANG environment variable is set to en_US.UTF-8, and all other UTF-8 behavior seems correct.

Just spun up an older Amazon Linux AMI instance running R15B01. Same behavior.

My unicode experience consists of years of banging my head against various toolchain problems with only a marginal clue as to what’s supposed to happen, so my ability to troubleshoot this further is limited.

-John

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140310/7a442906/attachment.htm>