[erlang-questions] puzzled with this charset/encoding -related behaviour

zxq9 zxq9@REDACTED
Sat Oct 14 10:19:08 CEST 2017


On 2017年10月14日 土曜日 10:12:19 Attila Rajmund Nohl wrote:
> 2017-10-14 4:21 GMT+02:00 Alexandre Karpov <alexakarpov@REDACTED>:
> > TL;DR: how do I run erl which understands Unicode?
> >
> > Or, in more detail:
> >
> > (Disclaimer: this official documentation got me really humbled:
> > http://www1.erlang.org/doc/apps/stdlib/unicode_usage.html
> > , and just a little bit scared =) )
> >
> > Judging by my S/O question, which got 3 upvotes and no answers, I'm not the
> > only one wondering:
> > https://stackoverflow.com/questions/46735539/erlang-regexp-matching-on-chinese-characters
> >
> > Here's the gist of the problem:
> >
> > 57> "абв".
> >
> > [1072,1073,1074]
> >
> > The codes are correct Unicode for the [Cyrillic] characters - which means my
> > Terminal didn't fail to understand my keyboard's input =) but Erlang shell
> > didn't recognize Terminal's input as printable characters. And it is my
> > understanding that this is exactly why this call fails:
> >
> > 25> re:run("йцу.asd", xmerl_regexp:sh_to_awk("*.*"), [{capture, none}]). **
> > exception error: bad argument in function re:run/3 called as
> > re:run([1081,1094,1091,46,97,115,100], "^(.*\\..*)$", [{capture,none}])
> 
> Try
> 
> re:run(<<"йцу.asd"/utf8>>, xmerl_regexp:sh_to_awk("*.*"), [{capture, none}]).

FYI: the SO question has an answer now.

The regex execution needs to be put into unicode mode:

re:run(<<"йцу.asd"/utf8>>, xmerl_regexp:sh_to_awk("*.*"), [unicode, {capture, none}]).

-Craig



More information about the erlang-questions mailing list