Hello,<div><br class="webkit-block-placeholder"></div><div>I would like to propose to create two optimized BIF for search in binaries. These functions can be used to increase effectiveness of text parsing. These functions can be used for search of the new line, to accellerate and to parse XML documents. Something like these:</div>
<div><br class="webkit-block-placeholder"></div><div>find_1_byte( Char, Bytes ) -> Pos | not_found</div><div>Pos = Char = integer()</div><div>Bytes = bytes()</div><div><br class="webkit-block-placeholder"></div><div>Find 1-byte character in binary string. This function can be implemented to access the memory with 32-bit words. I can remember I seen functions implemented like this in Intel libraries.</div>
<div><br class="webkit-block-placeholder"></div><div>find_4_bytes( FourChars, Bytes ) -> Pos | not_found</div><div>FourChars = ??? << Symbols:32 >> | [ A, B, C, D ]</div><div>A = B = C = D = integer()</div>
<div><br> </div><div>The same idea, but here you have the possibility to compare 32-bit values at once. This function can be used to accelerate search for substring.</div><div><br class="webkit-block-placeholder"></div><div>
It would probably be preferable to implement general search for substring in BIF. Interface when we taking the list of 4 integers looks not good.</div><div><br class="webkit-block-placeholder"></div><div>Both functions might be generalized to work on iolist types.</div>
<div><br class="webkit-block-placeholder"></div><div>I've done some tests with reading files and splitting it for new lines with erlang - quite similar to line_server.erl module. This task is CPU bound, and most of the time is spent in find_1_byte. The same problems will occur when we try to parse text-based protocol, such as exchange quote data or XML. Erlang can do much better on these tasks.</div>
<div><br class="webkit-block-placeholder"></div><div>What do you thing?</div><div><br class="webkit-block-placeholder"></div><div>Thanks,</div><div>Vlad</div>