[erlang-questions] unicode chardata and re:run

Johannes Weißl <>
Wed May 2 15:09:37 CEST 2012


Hello List!

I have a question regarding the regular expression module:
The man page [1] says re:run/3 is accepting a unicode:charlist() [2] as
Subject/RE when the "unicode" option is supplied. However, using the
function with a UTF-8 binary also works:

  match = re:run(<<"foo">>, <<"f.o">>, [{capture, none}, unicode]).

I even found a test case which relies on re:run/3 accepting
unicode:chardata() (charlist + unicode_binary) in re_SUITE.erl [3].

Does this mean I can rely on re:run/3 accepting binaries (in this case
the documentation should be changed), or does re:run/3 only accept
charlists (in this case the test case needs to be changed)?

I found a post from 2010 [4] in which the first option is suggested.

[1] http://www.erlang.org/doc/man/re.html#run-3
[2] http://www.erlang.org/doc/man/unicode.html#type-charlist
[3] https://github.com/erlang/otp/blob/master/lib/stdlib/test/re_SUITE.erl#L295
[4] http://erlang.org/pipermail/erlang-patches/2010-January/000697.html


Greetings,

Johannes



More information about the erlang-questions mailing list