[erlang-bugs] Strings handled differently in the shell and compiled modules

Raimo Niskanen raimo+erlang-bugs@REDACTED
Fri Feb 19 14:00:30 CET 2010


On Thu, Feb 18, 2010 at 02:27:32PM -0800, Geoff Cant wrote:
> 
> I've just been working on some code and came across a surprising result
> and wonder if it's a bug.
> 
> If I create a module with a unicode string:
> 
> %%%%%
> -module(unitest).
> -export([test/0]).
> 
> test() ->
>     "©|®|???|[\\-\\.!,]".
> %%%%%
> 
> Then the following is true in the shell:
> unitest:test() =/= "©|®|???|[\\-\\.!,]".
> 
> That is, the string literal in the module is a list of utf-8 bytes and
> the shell string literal is a list of unicode codepoints; string
> literals have a different value depending on their context.
> 
> Have I simply missed something in the documentation that says this is
> the expected behaviour? If not, then it'd be nice if shell code and
> module code behaved as similarly as possible.

It might be a terminal and locale problem.

What does this produce?
  1> io:format("~w~n", ["©|®|???|[\\-\\.!,]"]).
  2> io:format("~w~n", [unitest:test()]).

And, at the shell prompt:
  $ locale
  $ env | grep '^LC_'
  $ echo $LANG
  $ cat >test.txt
©|®|???|[\\-\\.!,]
^D

  $ hexdump -C test.txt

> 
> Cheers,
> -- 
> Geoff Cant
> 
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
> 

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB


More information about the erlang-bugs mailing list