[erlang-bugs] Strings handled differently in the shell and compiled modules
Raimo Niskanen
raimo+erlang-bugs@REDACTED
Fri Feb 19 14:00:30 CET 2010
On Thu, Feb 18, 2010 at 02:27:32PM -0800, Geoff Cant wrote:
>
> I've just been working on some code and came across a surprising result
> and wonder if it's a bug.
>
> If I create a module with a unicode string:
>
> %%%%%
> -module(unitest).
> -export([test/0]).
>
> test() ->
> "©|®|???|[\\-\\.!,]".
> %%%%%
>
> Then the following is true in the shell:
> unitest:test() =/= "©|®|???|[\\-\\.!,]".
>
> That is, the string literal in the module is a list of utf-8 bytes and
> the shell string literal is a list of unicode codepoints; string
> literals have a different value depending on their context.
>
> Have I simply missed something in the documentation that says this is
> the expected behaviour? If not, then it'd be nice if shell code and
> module code behaved as similarly as possible.
It might be a terminal and locale problem.
What does this produce?
1> io:format("~w~n", ["©|®|???|[\\-\\.!,]"]).
2> io:format("~w~n", [unitest:test()]).
And, at the shell prompt:
$ locale
$ env | grep '^LC_'
$ echo $LANG
$ cat >test.txt
©|®|???|[\\-\\.!,]
^D
$ hexdump -C test.txt
>
> Cheers,
> --
> Geoff Cant
>
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
>
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the erlang-bugs
mailing list