[erlang-bugs] Strings handled differently in the shell and compiled modules

Igor Ribeiro Sucupira igorrs@REDACTED
Fri Feb 19 05:48:45 CET 2010


Hi, Geoff.

An Erlang "string" is actually just a list of integers representing
ISO-8859-1 characters.

1> [202, 204].
"ÊÌ"

If you need to work with Unicode characters, this module might help:
http://www.erlang.org/doc/man/unicode.html

Good luck.
Igor.

On Thu, Feb 18, 2010 at 8:27 PM, Geoff Cant <nem@REDACTED> wrote:
>
> I've just been working on some code and came across a surprising result
> and wonder if it's a bug.
>
> If I create a module with a unicode string:
>
> %%%%%
> -module(unitest).
> -export([test/0]).
>
> test() ->
>    "©|®|™|[\\-\\.!,]".
> %%%%%
>
> Then the following is true in the shell:
> unitest:test() =/= "©|®|™|[\\-\\.!,]".
>
> That is, the string literal in the module is a list of utf-8 bytes and
> the shell string literal is a list of unicode codepoints; string
> literals have a different value depending on their context.
>
> Have I simply missed something in the documentation that says this is
> the expected behaviour? If not, then it'd be nice if shell code and
> module code behaved as similarly as possible.
>
> Cheers,
> --
> Geoff Cant
>
> ________________________________________________________________
> erlang-bugs (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
>
>



-- 
"The secret of joy in work is contained in one word - excellence. To
know how to do something well is to enjoy it." - Pearl S. Buck.


More information about the erlang-bugs mailing list