[erlang-questions] puzzled with this charset/encoding -related behaviour

Alexandre Karpov alexakarpov@REDACTED
Sat Oct 14 04:21:41 CEST 2017


TL;DR: how do I run erl which understands Unicode?

Or, in more detail:

(Disclaimer: this official documentation got me really humbled:
http://www1.erlang.org/doc/apps/stdlib/unicode_usage.html
, and just a little bit scared =) )

Judging by my S/O question, which got 3 upvotes and no answers, I'm not the
only one wondering:
https://stackoverflow.com/questions/46735539/erlang-regexp-matching-on-chinese-characters

Here's the gist of the problem:

57> "абв".

[1072,1073,1074]
The codes are correct Unicode for the [Cyrillic] characters - which means
my Terminal didn't fail to understand my keyboard's input =) but Erlang
shell didn't recognize Terminal's input as printable characters. And it is
my understanding that this is exactly why this call fails:

25> re:run("йцу.asd", xmerl_regexp:sh_to_awk("*.*"), [{capture, none}]). **
exception error: bad argument in function re:run/3 called as
re:run([1081,1094,1091,46,97,115,100], "^(.*\\..*)$", [{capture,none}])

The reason why this came up is me trying the example from "Programming
Erlang" where Joe gives you a lib_find module, and demonstrates reading of
MP3 tags from files; because I tried looking for mp3 files on a path which
had some Chinese characters in some filenames, this problem arose.

I've tried finding way to run erl with a different charset (hoping for erl
--charset=UTF8 or something), but only found references to file names,
which in my case doesn't sound very related.

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20171013/6a825f01/attachment.htm>


More information about the erlang-questions mailing list