[erlang-questions] UTF-8 and escript problem

Mon Mar 10 14:54:36 CET 2014

Hi - this is very probably related to the problem (and cause) described in
this thread:
  http://erlang.org/pipermail/erlang-bugs/2012-January/002747.html
A simpler demo is:

  $ erl -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000),
init:stop().'
  Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:0]
[hipe] [kernel-poll:false] [systemtap]

  Eshell V5.9.1  (abort with ^G)
  1> [Ĭ]

versus

  $ erl -noinput -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000),
init:stop().'
  [\x{12C}]

and the cause is, briefly put, that the code handling output is different
in the two cases.

The solution, equally briefly, is to set the encoding manually in the
escript:

  $ erl -noinput -eval 'ok = io:setopts([{encoding, unicode}]),
io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().'
  [Ĭ]

(That is, I've not tested it with escript, but I'd be surprised if it
didn't work :-))

/Erik

2014-03-10 6:13 GMT+01:00 John Daily <jd@REDACTED>:

> I retract my retraction. Setting aside the unworkable Farsi2 variable,
> Farsi (expressed in bytes) really should work, but displays
> as \x{641}\x{627}\x{631}\x{633}\x{6CC}. Why?
>
> -John
>
> On Mar 10, 2014, at 12:34 AM, John Daily <jd@REDACTED> wrote:
>
> And I think I’ve found the solution in an older thread. Still no support
> for unicode strings directly embedded in the source. Thanks, sorry for the
> noise.
>
> -John
>
> On Mar 10, 2014, at 12:24 AM, John Daily <jd@REDACTED> wrote:
>
> I ran into a problem with io:format() when invoked remotely, and with
> Scott Fritchie’s assistance I’ve narrowed it down (I think) to escript.
>
> Here’s my script:
>
> #!/usr/bin/env escript
> %%
> %%! -sname foo
>
> main([]) ->
>    Other = 'bar@REDACTED',
>    Farsi = [1601,1575,1585,1587,1740],
>    Farsi2 = "فارسی",
>    spawn(Other, fun() -> io:format("Bytes: ~ts~n", [Farsi]) end),
>    spawn(Other, fun() -> io:format("Bytes: ~w~n", [Farsi]) end),
>    spawn(Other, fun() -> io:format("Paste: ~ts~n", [Farsi2]) end),
>    spawn(Other, fun() -> io:format("Paste: ~w~n", [Farsi2]) end),
>    timer:sleep(1000).
>
> When attempting this directly from a different node’s console (spawned to
> run on ‘bar’), everything displays as expected:
>
> Bytes: فارسی
> Bytes: [1601,1575,1585,1587,1740]
> Paste: فارسی
> Paste: [1601,1575,1585,1587,1740]
>
> When invoked via the above escript, I consistently get the wrong output:
>
> Bytes: \x{641}\x{627}\x{631}\x{633}\x{6CC}
> Bytes: [1601,1575,1585,1587,1740]
> Paste: فارسی
> Paste: [217,129,216,167,216,177,216,179,219,140]
>
> This is repeatable on R15B01 and R16B03-1, both under OS X Mavericks
> installed via kerl. My LANG environment variable is set to en_US.UTF-8, and
> all other UTF-8 behavior seems correct.
>
> Just spun up an older Amazon Linux AMI instance running R15B01. Same
> behavior.
>
> My unicode experience consists of years of banging my head against various
> toolchain problems with only a marginal clue as to what’s supposed to
> happen, so my ability to troubleshoot this further is limited.
>
> -John
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140310/68814c13/attachment.htm>