Streaming Input

Håkan Stenholm <>
Mon Feb 28 00:05:11 CET 2005


orbitz wrote:

> I am working with a protocol where the size of the following block is 
> told to me so I can just convert the next N bytes to, say, a string.  
> The problem is though, I'm trying to write this so it handles a stream 
> properly, so in the binary I have could be all N bytes that I need, or 
> something less than N. So at first I tried:
>
> extract_string(Tail, 0, Res) ->
>  {ok, {string, Res}, Tail};
> extract_string(<<H, Tail/binary>>, Length, Res) ->
>  extract_string(Tail, Length - 1, lists:append(Res, [H]));
> extract_string(<<>>, Length, Res) ->
>  case dispatch_message() of
>    {decode, _, Data} ->
>      extract_string(Data, Length, Res)
>  end.


extract_string(Tail, 0, Res) ->
{ok, {string, lists:reverse(Res)}, Tail};           %% when done reverse 
list back to intended order
extract_string(<<H, Tail/binary>>, Length, Res) ->
extract_string(Tail, Length - 1, [H | Res]);      %% turn O(N) operation 
into O(1) op.
extract_string(<<>>, Length, Res) ->
case dispatch_message() of
  {decode, _, Data} ->
    extract_string(Data, Length, Res)
end.

This version will be much faster than the original version, because 
appending elements to the end of a list is a O(N) operation  which is 
done N times (O(N2)) - instead append to front of list (O(1) operation) 
and reverse the list when your done with accumulating the (Res) list 
(O(N)).

>
> When the binary is empty but I still need more data it waits for 
> more.  I don't know if this is the proper idiom (it seems gross to me 
> but I am unsure of how to do it otherwise).  This is incredibly slow 
> though.  With a long string that I need to extract it takes a lot of 
> CPU and far too long.  So I decided to do:
>
> extract_string(Data, Length, _) ->
>  <<String:Length/binary, Tail/binary>> = Data,
>  {ok, {string, binary_to_list(String)}, Tail}.


You probably want something like this:

extract_string(Data, Length, _) ->
DataLength = size(Data),                 %% get length of Data
L = case DataLength >= Length of   true -> Length;
  false -> DataLength
end,
<<String:L/binary, Tail/binary>> = Data,
{ok, {string, binary_to_list(String)}, Tail}.

This  should be able to extract as much data as possible in a single 
binary access - this should be slightly faster than my pervious 
extract_string/3 update above.

>
> In terms of CPU and time this is much much better, but if I don't have 
> all N bytes it won't work.  Any suggestions?
>





More information about the erlang-questions mailing list