[erlang-questions] puzzled with this charset/encoding -related behaviour
Dan Gudmundsson
dangud@REDACTED
Sat Oct 14 10:24:41 CEST 2017
re:run("йцу.asd", xmerl_regexp:sh_to_awk("*.*"), [{capture, none},
unicode]).
The binary one matches since it works on bytes and not utf-8 characters?
Also the erlang shell doesn't know if a list of integers is a list of
integers or a string,
since they may be represented by the same list of integers.
So it tries to guess, by default it guesses that lists containing integers
larger than 255
is not a string but a list of integers. You can change that with:
(w)erl +pc unicode
1> "йцу.asd".
"йцу.asd"
/Dan
On Sat, Oct 14, 2017 at 10:12 AM Attila Rajmund Nohl <
attila.r.nohl@REDACTED> wrote:
> 2017-10-14 4:21 GMT+02:00 Alexandre Karpov <alexakarpov@REDACTED>:
> > TL;DR: how do I run erl which understands Unicode?
> >
> > Or, in more detail:
> >
> > (Disclaimer: this official documentation got me really humbled:
> > http://www1.erlang.org/doc/apps/stdlib/unicode_usage.html
> > , and just a little bit scared =) )
> >
> > Judging by my S/O question, which got 3 upvotes and no answers, I'm not
> the
> > only one wondering:
> >
> https://stackoverflow.com/questions/46735539/erlang-regexp-matching-on-chinese-characters
> >
> > Here's the gist of the problem:
> >
> > 57> "абв".
> >
> > [1072,1073,1074]
> >
> > The codes are correct Unicode for the [Cyrillic] characters - which
> means my
> > Terminal didn't fail to understand my keyboard's input =) but Erlang
> shell
> > didn't recognize Terminal's input as printable characters. And it is my
> > understanding that this is exactly why this call fails:
> >
> > 25> re:run("йцу.asd", xmerl_regexp:sh_to_awk("*.*"), [{capture, none}]).
> **
> > exception error: bad argument in function re:run/3 called as
> > re:run([1081,1094,1091,46,97,115,100], "^(.*\\..*)$", [{capture,none}])
>
> Try
>
> re:run(<<"йцу.asd"/utf8>>, xmerl_regexp:sh_to_awk("*.*"), [{capture,
> none}]).
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20171014/8d89b60d/attachment.htm>
More information about the erlang-questions
mailing list