Parsing infinite streams style (leex)

Robert Virding <>
Fri Mar 5 01:48:29 CET 2004

The leex generated tokenisers work just like that internally. By using the
right interface functions you can use this:

token/?/? get one token, incrementally if necessary, returning whats left.
tokens/?/? get al the tokens upto and including one defined as an end_token,
e.g. ". " in Erlang, also incrementally if necessary.
string/? works on what you give it.

Of course this doesn't parse anything. Funnily enough yecc generated parse
COULD do this but the interface doesn't support it. I could just not explain
properly to Calle what I needed, we weren't on the same wave length. Look at
how the Erlang dot handling is done. I also think it is possible to set it
up so you could have multiple interfaces into the parser to parse different
parts of one syntax definition. In Erlang terms you could have a form
interface and an expression interface to the same parser. Again I could not
explain to Calle what I meant. Pity.

But continuations are defintely the way to go.


P.S. It is left as an exercise to the reader to wrk out why you caould have
a term interface.

----- Original Message ----- 
From: "Torbjorn Tornkvist" <>
To: "Edmund Dengler" <>
Cc: <>
Sent: Thursday, March 04, 2004 9:42 PM
Subject: Re: Parsing infinite streams style

> I would suggest that you use anonymous functions to create continuations
> together with a state machine if possible. So:
> 1. If you run out of input data, then return:
>        {more, fun(MoreInput) -> state_N(MoreInput, X, Y, ....) end}
> 2. If you reach an accept state, then return:
>       {ok, Result, fun(MoreInput) -> state_N(Rest ++ MoreInput, X, Y,
> ....) end}
> Cheers, Tobbe
> Edmund Dengler wrote:
> >Hi all!
> >
> >A bit of continuation on my last question concerning style. I have an
> >external executable that will be feeding binary data to my Erlang
> >processes via a port. While I am parsing the data, I may need more bytes
> >to continue (as the source is an "infinite" stream of bytes). What is the
> >accepted methodology for doing this:
> >
> >(1) Parse what I have, if I don't have enough, cause an error to occur,
> >and return to the place I started from (or at least, return as much as I
> >could parse, and the remainder). Basically, I am parsing the next "chunk"
> >of data, and returning to the start point. Ie:
> >
> >  loop
> >    receive more bytes (& append to ones we currently have)
> >    while parse next chunk is successful
> >      call processing function
> >
> >If I run out of bytes during parsing, I keep adding onto the list I have.
> >_But_, I spend a lot of work reparsing what I have already done. To do
> >this efficiently, I would need some kind of continutation mechanism to
> >"here are more bytes, continue where you left off", which I don't believe
> >Erlang has.
> >
> >(2) Have some kind of lazy semantics of "get more bytes, I need it now".
> >
> >If I do (2), I obviously need to pass along functions to call as I match
> >each grouping (basically, "process the stuff we have matched so far, for
> >each chunk, call the processing function") along with some mechanism to
> >fetch more bytes. It also means that at every stage of the parse, I need
> >to check to see if I have enough bytes, and if I don't, call the "get
> >bytes, and try again", complicating my code (rather than the simple
> >"don't have enough, fail" model of (1)).
> >
> >I guess I could start to build a framework ala the FSM or ASN stuff, that
> >would do this wrapping for me, though it seems it would be a lot of work
> >and in the end would be the correct way (and obviously would allow me to
> >specify a DSL that makes specifying the patterns better; definitely the
> >approach I would take if using Scheme or Lisp).
> >
> >Is there a current style/methodology I should be looking at to do the
> >above? What is the "Erlang way"?
> >
> >Thanks!
> >Ed
> >
> >

More information about the erlang-questions mailing list