[erlang-questions] Searching binaries with matching?

Fredrik Svahn fredrik.svahn@REDACTED
Sun Oct 14 01:18:38 CEST 2007


By the way, although I think I understand why I get the following badmatch,
I find it slightly counter-intuitive:

49> B="hello".
"hello"
50> << "hello", D/binary>> = <<"hello world!">>.
<<"hello world!">>
51> << B, E/binary>> = <<"hello world!">>.
** exited: {{badmatch,<<"hello world!">>},[{erl_eval,expr,3}]} **
=ERROR REPORT==== 14-Oct-2007::00:50:51 ===
Error in process <0.141.0> with exit value: {{badmatch,<<12
bytes>>},[{erl_eval,expr,3}]}

BR /Fredrik


On 10/13/07, Fredrik Svahn <fredrik.svahn@REDACTED> wrote:
>
> Hi,
>
> I would find it extremely useful to have a binary matching which searches
> for the first occurrence of a supplied pattern in a binary, i.e. to
> extract the body from an HTML file I could write something like:
>
> << _/binary, "<body>", Body/binary, "</body>", _/binary >> = MyHTMLFile.
>
> or to split e.g. a SIP message into lines:
>
> get_sipheaders(<<>>, Acc) -> Acc.
> get_sipheaders(<<Line/binary, "\r\n", Rest/binary>>, Acc) ->
>   get_sipheaders(Rest, [Line | Acc]).
>
> get_sipheaders(<<"line1\r\nline2\r\nline3\r\n">>,[]) would result in
> [<<"line3">>, <<"line2">>, <<"line1">>]
>
> or a simple grep:
>
> grep(SearchPattern, File) ->
>   {ok, FileBin} = file:read_file(File),
>   try <<_/binary, SearchPattern/binary, _/binary>> = FileBin of
>      _Match -> match
>   catch
>     error:{badmatch, _} -> nomatch
>   end.
>
> I know that it would be possible to rewrite it and recursively search
> through a binary step by step by specifying the size of the first binary,
> but IIRC that is not very efficient compared to most modern search
> algorithms (for comparison, have a look at e.g.
> http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/).
> I even believe someone stated that it was usually more efficient to convert
> the binary to a list before searching it.
>
> My questions to the mailing list would be:
> 1. Is it possible to do something like this today (or in a near future)
> efficiently, e.g. through the regexp library?
> 2. Is there anything in the syntax above which would make it improper or
> even impossible to add a "searching match" to binary matches (given enough
> time and resources)?
> 3. If so, would it be possible to achieve the same thing with a slightly
> improved syntax?
>
> BR /Fredrik
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071014/c78599f7/attachment.htm>


More information about the erlang-questions mailing list