[erlang-questions] UTF-8 and escript problem
John Daily
jd@REDACTED
Mon Mar 10 15:53:59 CET 2014
Thank you, that solved the problem.
-John
On Mar 10, 2014, at 9:54 AM, Erik Søe Sørensen <eriksoe@REDACTED> wrote:
> Hi - this is very probably related to the problem (and cause) described in this thread:
> http://erlang.org/pipermail/erlang-bugs/2012-January/002747.html
> A simpler demo is:
>
> $ erl -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().'
> Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] [systemtap]
>
> Eshell V5.9.1 (abort with ^G)
> 1> [Ĭ]
>
> versus
>
> $ erl -noinput -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().'
> [\x{12C}]
>
> and the cause is, briefly put, that the code handling output is different in the two cases.
>
> The solution, equally briefly, is to set the encoding manually in the escript:
>
> $ erl -noinput -eval 'ok = io:setopts([{encoding, unicode}]), io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().'
> [Ĭ]
>
>
> (That is, I've not tested it with escript, but I'd be surprised if it didn't work :-))
>
> /Erik
>
>
> 2014-03-10 6:13 GMT+01:00 John Daily <jd@REDACTED>:
> I retract my retraction. Setting aside the unworkable Farsi2 variable, Farsi (expressed in bytes) really should work, but displays as \x{641}\x{627}\x{631}\x{633}\x{6CC}. Why?
>
> -John
>
> On Mar 10, 2014, at 12:34 AM, John Daily <jd@REDACTED> wrote:
>
>> And I think I’ve found the solution in an older thread. Still no support for unicode strings directly embedded in the source. Thanks, sorry for the noise.
>>
>> -John
>>
>> On Mar 10, 2014, at 12:24 AM, John Daily <jd@REDACTED> wrote:
>>
>>> I ran into a problem with io:format() when invoked remotely, and with Scott Fritchie’s assistance I’ve narrowed it down (I think) to escript.
>>>
>>> Here’s my script:
>>>
>>> #!/usr/bin/env escript
>>> %%
>>> %%! -sname foo
>>>
>>> main([]) ->
>>> Other = 'bar@REDACTED',
>>> Farsi = [1601,1575,1585,1587,1740],
>>> Farsi2 = "فارسی",
>>> spawn(Other, fun() -> io:format("Bytes: ~ts~n", [Farsi]) end),
>>> spawn(Other, fun() -> io:format("Bytes: ~w~n", [Farsi]) end),
>>> spawn(Other, fun() -> io:format("Paste: ~ts~n", [Farsi2]) end),
>>> spawn(Other, fun() -> io:format("Paste: ~w~n", [Farsi2]) end),
>>> timer:sleep(1000).
>>>
>>> When attempting this directly from a different node’s console (spawned to run on ‘bar’), everything displays as expected:
>>>
>>> Bytes: فارسی
>>> Bytes: [1601,1575,1585,1587,1740]
>>> Paste: فارسی
>>> Paste: [1601,1575,1585,1587,1740]
>>>
>>> When invoked via the above escript, I consistently get the wrong output:
>>>
>>> Bytes: \x{641}\x{627}\x{631}\x{633}\x{6CC}
>>> Bytes: [1601,1575,1585,1587,1740]
>>> Paste: فارسی
>>> Paste: [217,129,216,167,216,177,216,179,219,140]
>>>
>>> This is repeatable on R15B01 and R16B03-1, both under OS X Mavericks installed via kerl. My LANG environment variable is set to en_US.UTF-8, and all other UTF-8 behavior seems correct.
>>>
>>> Just spun up an older Amazon Linux AMI instance running R15B01. Same behavior.
>>>
>>> My unicode experience consists of years of banging my head against various toolchain problems with only a marginal clue as to what’s supposed to happen, so my ability to troubleshoot this further is limited.
>>>
>>> -John
>>>
>>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140310/af9ed128/attachment.htm>
More information about the erlang-questions
mailing list