[erlang-bugs] different behaviour of re:replace for directly-specified and precompiled regular expressions

Robin Haberkorn <>
Mon Sep 12 16:15:31 CEST 2011


I think I may have discovered a bug in the stdlib 're' module.

For some Erlang strings, re:replace behaves differently
for regular expressions "re:compile"d with the 'unicode'
option and regular expressions passed uncompiled to
re:replace, giving 'unicode' in its options list.

I've minimized the test case using PropEr.
Have a look at the following erl session:

Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.8.4  (abort with ^G)
1> RegExp = ".".
2> {ok, RegExpC} = re:compile(RegExp, [unicode]).
3> re:replace([133], RegExp, " ", [unicode, global]).
[<<" ">>]
4> re:replace([133], RegExpC, " ", [global]).
5> unicode:characters_to_binary(re:replace([133], RegExp, " ", [unicode, global])).
<<" ">>
6> unicode:characters_to_binary(re:replace([133], RegExpC, " ", [global])).         

That is, in (4) the replacement simply isn't performed.
[133] should be a valid unicode charlist and 133 a valid
unicode codepoint.
I've discovered this by running re:replace on io_lib:format
return values. If I'm not totally confused by Erlang's
Unicode handling, io_lib:format without the unicode
translation modifier returns a (deep) list of byte()s.
Since they are integer lists the UTF8 binary encoding does
not matter and all integers returned are valid unicode
code points (unicode:characters_to_binary does
not seem to complain about any list that causes these problems
with re:replace).

Moreover consider the following difference:

10> re:replace([256], RegExpC, " ", [global]).                               
** exception error: bad argument
     in function  re:replace/4
        called as re:replace([256],
                             " ",
11> re:replace([256], RegExp, " ", [unicode,global]).
[<<" ">>]

Almost as if re:replace would expect only byte()s in (10).

Is this desired behaviour, perhaps even documented?

Best Regards,

------------------ managed broadband access ------------------

Travelping GmbH               phone:           +49-391-8190990
Roentgenstr. 13               fax:           +49-391-819099299
D-39108 Magdeburg             email:       
GERMANY                       web:   http://www.travelping.com

Company Registration: Amtsgericht Stendal Reg No.:   HRB 10578
Geschaeftsfuehrer: Holger Winkelmann | VAT ID No.: DE236673780

More information about the erlang-bugs mailing list