Parsing infinite streams style

Thu Mar 4 20:41:03 CET 2004

Hi all!

A bit of continuation on my last question concerning style. I have an
external executable that will be feeding binary data to my Erlang
processes via a port. While I am parsing the data, I may need more bytes
to continue (as the source is an "infinite" stream of bytes). What is the
accepted methodology for doing this:

(1) Parse what I have, if I don't have enough, cause an error to occur,
and return to the place I started from (or at least, return as much as I
could parse, and the remainder). Basically, I am parsing the next "chunk"
of data, and returning to the start point. Ie:

  loop
    receive more bytes (& append to ones we currently have)
    while parse next chunk is successful
      call processing function

If I run out of bytes during parsing, I keep adding onto the list I have.
_But_, I spend a lot of work reparsing what I have already done. To do
this efficiently, I would need some kind of continutation mechanism to say
"here are more bytes, continue where you left off", which I don't believe
Erlang has.

(2) Have some kind of lazy semantics of "get more bytes, I need it now".

If I do (2), I obviously need to pass along functions to call as I match
each grouping (basically, "process the stuff we have matched so far, for
each chunk, call the processing function") along with some mechanism to
fetch more bytes. It also means that at every stage of the parse, I need
to check to see if I have enough bytes, and if I don't, call the "get more
bytes, and try again", complicating my code (rather than the simple
"don't have enough, fail" model of (1)).

I guess I could start to build a framework ala the FSM or ASN stuff, that
would do this wrapping for me, though it seems it would be a lot of work
and in the end would be the correct way (and obviously would allow me to
specify a DSL that makes specifying the patterns better; definitely the
approach I would take if using Scheme or Lisp).

Is there a current style/methodology I should be looking at to do the
above? What is the "Erlang way"?

Thanks!
Ed