[erlang-questions] [eeps] EEP 9

Fredrik Svahn fredrik.svahn@REDACTED
Tue Mar 4 22:02:04 CET 2008


Thanks for the positive response both here an on the EEPs mailing list!

My suggestion is that we continue the discussion regarding EEP-9 here
on the erlang-questions mailing list since it has a broader audience
than eeps@REDACTED I think it would make it a lot easier if we can
keep it in this thread.

For those who do not subscribe to the eeps mailing list, I would like
to once again stress that this EEP does not cover improvements which
requires changes to the compiler. It has been suggested to me that the
smaller the EEP the faster it gets implemented so let us please keep
the scope as narrow as possible. There is always room for new EEPs.

A shortcut to EEP-9: http://www.erlang.org/eeps/eep-0009.html

Now, to answer your questions:

> 1) Can the reference implementation be made available publicly as a
> patch to R12B-1?  (Or actually in any fashion would be great.)

The reference implementation in erlang is attached, the c part will
follow in the next message as there is a maximum message size on
erlang-questions. It matches the described functions minus what was
added after a first round of comments from the OTP developers (the
regexps for instance are probably going to have many more features
after what I hear!). I may also have added one or two relatively
simple functions, e.g. atom_to_binary, after the implementation was
submitted.

> 2) Which algorithm was choosen for the binary:match()?  For multiple
> keyword, Aho-Corasick would be great, especially if the interface was
> something like this:

In the reference implementation Aho-Corasick was indeed used for
multiple keywords. Boyer-Moore was a logical choice for single
keywords and there was even a brute force approach for short
keywords/searchstrings.

>    MatchContext = binary:match_compile( [<<"the">>, <<"big">>,
>                                          <<"frog">>] ),
>    Value = <<"when we had a frog, he was big">>,
>    [{3, 14}, {2, 27}] = binary:match( MatchContext, Value )

Thanks for the comment. I think it makes a lot of sense to have a
separation of the building of the trie and the actual searching as you
suggest, especially if you are searching many binaries for the same
keyword(s). I am not sure how easy it would be to implement it,
though. I guess the returned match context would have to be some kind
of reference or packed in a binary. Maybe someone from the OTP team
could comment on that. I actually wanted to have the same thing for
regular expressions.

I note that you also want a function which returns all the matches not
just the first one. Given that binary:match/3 takes indexes of the
start and end position of the search it is easy to also add a function
"binary:matches/2" in binary.erl which returns all the matches by
repeatedly calling match/3. Having both "match/2" and "matches/2"
would be similar to how it is done in the regexp module.

BR /Fredrik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: binary.erl
Type: application/octet-stream
Size: 4156 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080304/baf992ca/attachment.obj>


More information about the erlang-questions mailing list