[eeps] optimized search in binaries
Thu Jan 31 15:22:41 CET 2008
On Sun, Jan 27, 2008 at 10:20:07PM +0300, Vlad Balin wrote:
> I would like to propose to create two optimized BIF for search in binaries.
> These functions can be used to increase effectiveness of text parsing. These
> functions can be used for search of the new line, to accellerate and to
> parse XML documents. Something like these:
> find_1_byte( Char, Bytes ) -> Pos | not_found
> Pos = Char = integer()
> Bytes = bytes()
> Find 1-byte character in binary string. This function can be implemented to
> access the memory with 32-bit words. I can remember I seen functions
> implemented like this in Intel libraries.
> find_4_bytes( FourChars, Bytes ) -> Pos | not_found
> FourChars = ??? << Symbols:32 >> | [ A, B, C, D ]
> A = B = C = D = integer()
> The same idea, but here you have the possibility to compare 32-bit values at
> once. This function can be used to accelerate search for substring.
> It would probably be preferable to implement general search for substring in
> BIF. Interface when we taking the list of 4 integers looks not good.
> Both functions might be generalized to work on iolist types.
> I've done some tests with reading files and splitting it for new lines with
> erlang - quite similar to line_server.erl module. This task is CPU bound,
> and most of the time is spent in find_1_byte. The same problems
> will occur when we try to parse text-based protocol, such as exchange quote
> data or XML. Erlang can do much better on these tasks.
> What do you thing?
These are probably nice ideas, but the form of the suggestion
is not valid as an EEP. See http://www.erlang.org/eeps/eep-0001.html,
and take a discussion on first,
then submit a formal EEP, or perhaps a patch suggestion to
There are possibly many ideas at erlang-questions about how to speed up
parsing of binaries, or XML in any form.
> eeps mailing list
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
More information about the eeps