Parsing infinite streams style

Thu Mar 4 21:42:06 CET 2004

I would suggest that you use anonymous functions to create continuations
together with a state machine if possible. So:

1. If you run out of input data, then return:
       {more, fun(MoreInput) -> state_N(MoreInput, X, Y, ....) end}

2. If you reach an accept state, then return:
      {ok, Result, fun(MoreInput) -> state_N(Rest ++ MoreInput, X, Y, 
....) end}

Cheers, Tobbe

Edmund Dengler wrote:

>Hi all!
>
>A bit of continuation on my last question concerning style. I have an
>external executable that will be feeding binary data to my Erlang
>processes via a port. While I am parsing the data, I may need more bytes
>to continue (as the source is an "infinite" stream of bytes). What is the
>accepted methodology for doing this:
>
>(1) Parse what I have, if I don't have enough, cause an error to occur,
>and return to the place I started from (or at least, return as much as I
>could parse, and the remainder). Basically, I am parsing the next "chunk"
>of data, and returning to the start point. Ie:
>
>  loop
>    receive more bytes (& append to ones we currently have)
>    while parse next chunk is successful
>      call processing function
>
>If I run out of bytes during parsing, I keep adding onto the list I have.
>_But_, I spend a lot of work reparsing what I have already done. To do
>this efficiently, I would need some kind of continutation mechanism to say
>"here are more bytes, continue where you left off", which I don't believe
>Erlang has.
>
>(2) Have some kind of lazy semantics of "get more bytes, I need it now".
>
>If I do (2), I obviously need to pass along functions to call as I match
>each grouping (basically, "process the stuff we have matched so far, for
>each chunk, call the processing function") along with some mechanism to
>fetch more bytes. It also means that at every stage of the parse, I need
>to check to see if I have enough bytes, and if I don't, call the "get more
>bytes, and try again", complicating my code (rather than the simple
>"don't have enough, fail" model of (1)).
>
>I guess I could start to build a framework ala the FSM or ASN stuff, that
>would do this wrapping for me, though it seems it would be a lot of work
>and in the end would be the correct way (and obviously would allow me to
>specify a DSL that makes specifying the patterns better; definitely the
>approach I would take if using Scheme or Lisp).
>
>Is there a current style/methodology I should be looking at to do the
>above? What is the "Erlang way"?
>
>Thanks!
>Ed
>  
>