[erlang-questions] UTF-8 and escript problem

John Daily <>
Mon Mar 10 15:53:59 CET 2014


Thank you, that solved the problem.

-John

On Mar 10, 2014, at 9:54 AM, Erik Søe Sørensen <> wrote:

> Hi - this is very probably related to the problem (and cause) described in this thread:
>   http://erlang.org/pipermail/erlang-bugs/2012-January/002747.html
> A simpler demo is:
> 
>   $ erl -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().' 
>   Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] [systemtap]
>   
>   Eshell V5.9.1  (abort with ^G)
>   1> [Ĭ]
> 
> versus
> 
>   $ erl -noinput -eval 'io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().' 
>   [\x{12C}]
> 
> and the cause is, briefly put, that the code handling output is different in the two cases.
> 
> The solution, equally briefly, is to set the encoding manually in the escript:
> 
>   $ erl -noinput -eval 'ok = io:setopts([{encoding, unicode}]), io:format("[~ts]\n", [[300]]), timer:sleep(2000), init:stop().' 
>   [Ĭ]
> 
> 
> (That is, I've not tested it with escript, but I'd be surprised if it didn't work :-))
> 
> /Erik
> 
> 
> 2014-03-10 6:13 GMT+01:00 John Daily <>:
> I retract my retraction. Setting aside the unworkable Farsi2 variable, Farsi (expressed in bytes) really should work, but displays as \x{641}\x{627}\x{631}\x{633}\x{6CC}. Why?
> 
> -John
> 
> On Mar 10, 2014, at 12:34 AM, John Daily <> wrote:
> 
>> And I think I’ve found the solution in an older thread. Still no support for unicode strings directly embedded in the source. Thanks, sorry for the noise.
>> 
>> -John
>> 
>> On Mar 10, 2014, at 12:24 AM, John Daily <> wrote:
>> 
>>> I ran into a problem with io:format() when invoked remotely, and with Scott Fritchie’s assistance I’ve narrowed it down (I think) to escript.
>>> 
>>> Here’s my script:
>>> 
>>> #!/usr/bin/env escript
>>> %%
>>> %%! -sname foo
>>> 
>>> main([]) ->
>>>    Other = '',
>>>    Farsi = [1601,1575,1585,1587,1740],
>>>    Farsi2 = "فارسی",
>>>    spawn(Other, fun() -> io:format("Bytes: ~ts~n", [Farsi]) end),
>>>    spawn(Other, fun() -> io:format("Bytes: ~w~n", [Farsi]) end),
>>>    spawn(Other, fun() -> io:format("Paste: ~ts~n", [Farsi2]) end),
>>>    spawn(Other, fun() -> io:format("Paste: ~w~n", [Farsi2]) end),
>>>    timer:sleep(1000).
>>> 
>>> When attempting this directly from a different node’s console (spawned to run on ‘bar’), everything displays as expected:
>>> 
>>> Bytes: فارسی
>>> Bytes: [1601,1575,1585,1587,1740]
>>> Paste: فارسی
>>> Paste: [1601,1575,1585,1587,1740]
>>> 
>>> When invoked via the above escript, I consistently get the wrong output:
>>> 
>>> Bytes: \x{641}\x{627}\x{631}\x{633}\x{6CC}
>>> Bytes: [1601,1575,1585,1587,1740]
>>> Paste: فارسی
>>> Paste: [217,129,216,167,216,177,216,179,219,140]
>>> 
>>> This is repeatable on R15B01 and R16B03-1, both under OS X Mavericks installed via kerl. My LANG environment variable is set to en_US.UTF-8, and all other UTF-8 behavior seems correct.
>>> 
>>> Just spun up an older Amazon Linux AMI instance running R15B01. Same behavior.
>>> 
>>> My unicode experience consists of years of banging my head against various toolchain problems with only a marginal clue as to what’s supposed to happen, so my ability to troubleshoot this further is limited.
>>> 
>>> -John
>>> 
>> 
> 
> 
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20140310/af9ed128/attachment.html>


More information about the erlang-questions mailing list