[erlang-questions] Binary pattern matching inconsistencies with R12B

Fri Feb 29 18:43:44 CET 2008

Hello

I'm writing a scanner for a query language and I'm encountering
intermittent segmentation faults and other odd errors. The 
code I'm working on appears to work fine on 11.b.2-4 
(linux/amd64), but gives problems on r12b-0 (linux/i386) and 
r12b-1 (linux/amd64). I didn't add any fancy options when I 
compiled r12b, just a --prefix. 

I'm an erlang newbie so highly likely I've written something
stupid. Just hope it's obvious whatever it is!

The scanner is quite large so I've reduced it down to two
smaller programs which show similar symptoms. The first one
just throws exceptions from time to time. The second program
ends up dying as a result of a segmentation fault sooner 
of later. 

The following code won't make real sense. The full scanner 
makes sense but this is only a mutated 10% of that. Sorry
the code is so unintelligible - but on the bright side 
it fails more frequently and predictably than the full 
scanner does.

%% START OF CODE: weird.erl %%

-module(weird).
-compile(export_all).

%% For testing - runs scanner N number of times with same input
run(N) ->
        lists:foreach(fun(_) ->
                             scan(<<"region:whatever">>, [])
                      end, lists:seq(1, N)).

scan(<<>>, TokAcc) ->
        lists:reverse(['$thats_all_folks$' | TokAcc]);

scan(<<D, $\s, Rest/binary>>, TokAcc) when
                        (D =:= $D) or (D =:= $d) ->
        scan(Rest, ['AND' | TokAcc]);

scan(<<D>>, TokAcc) when
                        (D =:= $D) or (D =:= $d) ->
        scan(<<>>, ['AND' | TokAcc]);

scan(<<N, Z, Rest/binary>>, TokAcc) when
                        (N =:= $N) or (N =:= $n),
                        (Z =:= $\s)  ->
        scan(<<Z, Rest/binary>>, ['NOT' | TokAcc]);

scan(<<C, Rest/binary>>, TokAcc) when
                                (C >= $A) and (C =< $Z);
                                (C >= $a) and (C =< $z);
                                (C >= $0) and (C =< $9) ->
        case Rest of
                <<$:, R/binary>> ->
                        scan(R, [{'FIELD', C} | TokAcc]);
                _ ->
                        scan(Rest, [{'KEYWORD', C} | TokAcc])
        end.

%% END OF CODE %%

Here's what I see from the shell on an i386 machine:

1> c(weird).
{ok,weird}
2> weird:run(1000).
ok
3> weird:run(1000).
ok
4> weird:run(1000).
ok
5> weird:run(1000).
** exception error: no function clause 
                      matching weird:scan(<<"whatever">>,
                                            [{'FIELD',110},
                                             {'KEYWORD',111},
                                             {'KEYWORD',105},
                                             {'KEYWORD',103},
                                             {'KEYWORD',101},
                                             {'KEYWORD',114}])
     in function  lists:foreach/2
6> weird:run(1000).
** exception error: no function clause 
                      matching weird:scan(<<"whatever">>,
                                            [{'FIELD',110},
                                             {'KEYWORD',111},
                                             {'KEYWORD',105},
                                             {'KEYWORD',103},
                                             {'KEYWORD',101},
                                             {'KEYWORD',114}])
     in function  lists:foreach/2
7> 

It will then keep throwing exceptions from this point on. On an
amd64 machine I'm getting similar output, but it usually has
the sequence ok, error, ok, error... And if I bump it from 
1,000 up to 10,000 iterations the errors usually stop (on amd64).

The second block of code is:

%% START OF CODE: scanner.erl %%

-module(scanner).
-compile(export_all).

%% For testing - runs scanner N number of times with same input
run(N) ->
        lists:foreach(fun(_) ->
                             scan(<<"region:whatever">>, [])
                      end, lists:seq(1, N)).

scan(<<>>, TokAcc) ->
        lists:reverse(['$thats_all_folks$' | TokAcc]);

scan(<<D, Z, Rest/binary>>, TokAcc) when
                        (D =:= $D orelse D =:= $d) and
                        ((Z =:= $\s) or (Z =:= $() or (Z =:= $))) ->
        scan(<<Z, Rest/binary>>, ['AND' | TokAcc]);

scan(<<D>>, TokAcc) when
                        (D =:= $D) or (D =:= $d) ->
        scan(<<>>, ['AND' | TokAcc]);

scan(<<N, Z, Rest/binary>>, TokAcc) when
                        (N =:= $N orelse N =:= $n) and
                        ((Z =:= $\s) or (Z =:= $() or (Z =:= $))) ->
        scan(<<Z, Rest/binary>>, ['NOT' | TokAcc]);

scan(<<C, Rest/binary>>, TokAcc) when
                                (C >= $A) and (C =< $Z);
                                (C >= $a) and (C =< $z);
                                (C >= $0) and (C =< $9) ->
        case Rest of
                <<$:, R/binary>> ->
                        scan(R, [{'FIELD', C} | TokAcc]);
                _ ->
                        scan(Rest, [{'KEYWORD', C} | TokAcc])
        end.

%% END OF CODE %%

When I use this code in the shell (on i386) is usually works okay
for a smaller number of iterations but when you get into the 
hundreds it dies fast:

1> c(scanner).
{ok,scanner}
2> scanner:run(10).     % Start with 10
ok
3> scanner:run(10).
ok
4> scanner:run(100).    % Bumped up to 100
** exception error: no function clause
                      matching weird:scan(<<"whatever">>,
                                            [{'FIELD',110},
                                             {'KEYWORD',111},
                                             {'KEYWORD',105},
                                             {'KEYWORD',103},
                                             {'KEYWORD',101},
                                             {'KEYWORD',114}])
     in function  lists:foreach/2
5> scanner:run(100).
Segmentation fault

Anyone got any ideas? 

Cheers,

Rory