Streaming Input
Joe Armstrong (AL/EAB)
joe.armstrong@REDACTED
Tue Mar 1 09:29:07 CET 2005
This is slightly more difficult (not a lot)
you're now talking about re-entrant parsers.
The general form of these is as follows:
Assume you have some parsing function F which parses
a binary B. Calling F(B) returns done, P, F'} where P is a parse
and F' a new parsing function
or {more, F'} where F' is a new parsing function.
You could call it like this:
Socket = ...
F = init(),
loop(Socket, F).
where
loop(Socket, F) ->
receive
{tcp,Socket,Bin} ->
case F(Bin) of
{done,Parse, F1} ->
... do something with Parse ...
loop(Socket, F1);
{more, F1} ->
loop(Socket, F1)
end;
...
Now you have to define F
Let's give an example. Suppose the input
has "begin" ... "end" symbols or #N (one byte followed by N bytes)
init() -> fun(B) -> top(binary_to_list(B)) end.
top("begin" + T) -> collect_body(T, []);
top([$#,N|T]) -> collect_bytes(T, N, []);
top(X) -> fun(B) -> top(X ++ binary_to_list(B)) end.
collect_body("end" ++ T, L) -> {done, reverse(L), list_to_binary(T)};
collect_body([H|T], L) -> collect_body(T, [H|L]);
collect_body([], L) -> fun(B) -> collect_body(binary_to_list(B), L) end.
etc.
This is pretty simple code, the point to note is the last clause of every group
collect_body([], L) is called when we "run out of stuff to parse" - what do we want to do
then?
Answer: wait for "More" data and then call collect_body(More, L) - that's just what the
last clause says:
collect_body([], L) ->
fun(B) -> collect_body(binary_to_list(B), L) end.
This kind of code is very easy if you just "follow the pattern" and don't think (TM)
BTW you have to get the code right first time - debugging this is not easy if you
make a silly mistake :-)
Cheers
/Joe
> -----Original Message-----
> From: orbitz [mailto:orbitz@REDACTED]
> Sent: den 28 februari 2005 22:55
> To: Joe Armstrong (AL/EAB)
> Cc: erlang-questions@REDACTED
> Subject: Re: Streaming Input
>
>
> I'm not sure that'll work in my situation necessarily. In this
> protocol, only some objects have a size specification, and
> others don't.
> And the ones that don't can be variable size. It uses
> prefix/suffix to
> say when decoding should start and end. Also I don't know how much I
> need until I've identified what type it is and started extracting it,
> and since sometimes are variable in size and the protocol
> uses a suffix
> to tell me when to stop decoding that type I can't figure out
> how much I
> need. Perhaps my original idea of figuring out what type it is then
> sending to a special extract function for that type is no good? It
> seems simpler that way since I don't need to keep track of state, but
> more prone to issues since I need to go back to this waiting function
> every time I run out of data but haven't finished decoding my object.
>
> Thanks
>
> Joe Armstrong (AL/EAB) wrote:
>
> > use binaries - that's what they are for
> >
> > First write something like this:
> >
> > extract(BinIn, Need, BinAcc) ->
> > Got = size(BinIn),
> > if
> > Got > Need ->
> > {Before, After} = split_binary(BinIn, Need),
> > Result = concat_binary([BinAcc, Before]),
> > {done, Result, After};
> > Got == Need ->
> > Result = concat_binary([BinAcc,BinIn]),
> > {done, Result, <<>>};
> > Got < Need ->
> > BinAcc1 = concat_binary([BinAcc, BinIn]),
> > {more, Need - Got, BinAcc1}
> > end.
> >
> > <aside>
> > Organising the code like this should make it pretty clear
> > what's going on ie write the "if" clearly with three branches
> >
> > if
> > Got > Need ->
> > %% too much data - have to split it
> > ...
> > Got == Need ->
> > %% exactly right no need to split
> > ...
> > Got < Need ->
> > %% not enough. no need to split
> > ...
> > end
> > </aside>
> >
> > here
> >
> > in extract(BinIn, Need, BinAcc)
> > More and Sofar are binaries
> > Need is the required block length
> >
> > if size(BinIn) > Need we split the block into two chunks
> > and return {done, Bin, After} Bin = is the data you need
> > otherwise {more, Need-Got, BinAcc}
> >
> > BinAcc is a binary accumulator containing all the data
> received so far.
> >
> > Then just arrange so code to call this
> >
> > Cheers
> >
> > /Joe
> >
> >
> >
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: owner-erlang-questions@REDACTED
> >>[mailto:owner-erlang-questions@REDACTED]On Behalf Of orbitz
> >>Sent: den 27 februari 2005 07:49
> >>To: erlang-questions@REDACTED
> >>Subject: Streaming Input
> >>
> >>
> >>I am working with a protocol where the size of the
> following block is
> >>told to me so I can just convert the next N bytes to, say,
> a string.
> >>The problem is though, I'm trying to write this so it handles
> >>a stream
> >>properly, so in the binary I have could be all N bytes that I
> >>need, or
> >>something less than N. So at first I tried:
> >>
> >>extract_string(Tail, 0, Res) ->
> >> {ok, {string, Res}, Tail};
> >>extract_string(<<H, Tail/binary>>, Length, Res) ->
> >> extract_string(Tail, Length - 1, lists:append(Res, [H]));
> >>extract_string(<<>>, Length, Res) ->
> >> case dispatch_message() of
> >> {decode, _, Data} ->
> >> extract_string(Data, Length, Res)
> >> end.
> >>
> >>When the binary is empty but I still need more data it waits
> >>for more.
> >>I don't know if this is the proper idiom (it seems gross to
> >>me but I am
> >>unsure of how to do it otherwise). This is incredibly slow
> though.
> >>With a long string that I need to extract it takes a lot of
> >>CPU and far
> >>too long. So I decided to do:
> >>
> >>extract_string(Data, Length, _) ->
> >> <<String:Length/binary, Tail/binary>> = Data,
> >> {ok, {string, binary_to_list(String)}, Tail}.
> >>
> >>In terms of CPU and time this is much much better, but if I
> >>don't have
> >>all N bytes it won't work. Any suggestions?
> >>
> >>
> >>
> >
> >
> >
> >
> >
>
More information about the erlang-questions
mailing list