[erlang-questions] Regexp Matching on Unicode

José Valim <>
Tue Dec 13 11:40:04 CET 2016


Make sure to escape the property escape character and to also pass the
[unicode] flag when compiling and it should be good to go:

28> {ok, Reg} = re:compile("\\p{L}{5}", []).
{ok,{re_pattern,0,0,0,
                <<69,82,67,80,77,0,0,0,0,0,0,0,1,0,0,0,255,255,255,255,
                  255,255,...>>}}
29> re:run(<<"こんにちは"/utf8>>, Reg).
nomatch

30> {ok, RegUni} = re:compile("\\p{L}{5}", [unicode]).
{ok,{re_pattern,0,1,0,
                <<69,82,67,80,77,0,0,0,0,8,0,0,1,0,0,0,255,255,255,255,
                  255,255,...>>}}
31> re:run(<<"こんにちは"/utf8>>, RegUni).
{match,[{0,15}]}




*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D

On Tue, Dec 13, 2016 at 11:32 AM, Zachary Kessin <> wrote:

> Hi All
>
> I am hitting a bit of a wall here, I am building a lexer with leex and I
> really want to match on unicode chars, there is a regex class \p{Letter}
> but that does not seem to work in erlang. I really want is a way to say
> "Match a letter, but not a digit". So the \w would not work. Any ideas?
>
> --
> Zach Kessin
> SquareTarget <http://squaretarget.rocks?utm_source=email-sig>
> Twitter: @zkessin <https://twitter.com/zkessin>
> Skype: zachkessin
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161213/00c265b9/attachment.html>


More information about the erlang-questions mailing list