[erlang-questions] Word boundary assertion matching for unicode strings in re module

Victor Antonovich v.antonovich@REDACTED
Wed Nov 21 10:22:45 CET 2012


It looks like Erlang re module can't match word boundary assertion (\b)
for non-latin characters in unicode strings:

$ erl
Erlang R15B02 (erts-5.9.2) [source] [64-bit] [smp:8:8] [async-threads:0]

Eshell V5.9.2  (abort with ^G)
1> {_, R} = re:compile("\\b\\p{L}+\\b", [unicode, caseless]).
2> re:run("abc 123 def", R, [global]).
3> re:run("abc 123 абв", R, [global]).
4> "abc 123 абв".
5> {_, R1} = re:compile("\\p{L}+", [unicode, caseless]).
6> re:run("abc 123 def", R1, [global]).
7> re:run("abc 123 абв", R1, [global]).

Is it intended behaviour or i missed something?


More information about the erlang-questions mailing list