Parsing infinite streams style

Joe Armstrong joe@REDACTED
Tue Mar 9 11:16:27 CET 2004

```> I think I understand why it works, but need a few more iterations to
> actually think that way - it's like the difference between listening
> to an In-Flight Swedish CD on the plane to Stockholm vs. being able
> converse in the language on arrival.

It's easy to understand it you think about the principles involved
and don't stare blindly at the code.

Assume f is a function which parses X - to evoke the parser we call f(X)

now there are two cases:

- we got enough data to parse. We do the parse the result is R
and some data left over X1. So in this case

f(X) => {ok, R, X1}

- we didn't get enough data. So we return

f(X) => {more, F1}

and the *next* time we get data X we call F1(X)

Restarting the parse at the "top" is just equivalent to writing the parse
function like this:

f(X) ->
case is_there_enough_data_to_do_the_parse(X) of
true ->
do_the_parse(X);
false ->
{more, fun(Z) -> X ++ Z end}
end.

<<aside1 I actually returned {done,Result,NewParserFunction} not
{ok, R, X1} - but the idea is the same>>

<<aside2: It's not a good idea to try and think about how mutually
recursive definitions work (it just makes you head hurt) - think
base case, recursive case. Then prove that the recursive case
make the problem smaller.

I learnt this in school, but it was called something different.

I learnt this:

sin(2A) = 2sin(A)cos(A)
cos(2A) = 1 - 2sin^(X)

sin(0) = 0
cos(0) = 1

So to compute sin(X) we have to compute sin(X/2) and cos(X/2)
and to compute cos(X) we have to compute sin(X/2)

this is the recursive case where the argument gets smaller.

The base  case is sin(0) =  0 or cos(0)  = 1 (actually it's  a small
angle not zero)

So this recursive case/base case and reducing argument pattern
is very well known :-) - it's they didn't tell me in school that
this pattern of equations was a computer program :-)

>>

/Joe

>
> > but this can be broken into arbitrary chucks - we can write a
> > reentrant parser like this:
> >
> > get_begin("begin " ++ T) -> get_int(T, 0);
> > get_begin(S)             -> {more, fun(Data) -> get_begin(S ++ D) end}.
> >
> > get_int(" " ++ T, N) -> get_end(T, N);
> > get_int([H|T], N)    -> get_int(T, N*10-H-\$0);
> > get_int([], N)       -> {more, fun(S) -> get_int(S, N)}.
> >
> > get_end("end" ++ T, N) -> {done, N, fun(I) -> get_begin(I)};
> > get_end(T, N)          -> {more, fun(S) -> get_end(T++S, N) end}.
> >
> > This way avoids having to restart the parsing at the beginning
> > every time.
> >
> >   Having said  all of this I'd  be rather careful  - the "continuation
> > passing with funs" method is elegant BUT a bugger to debug if you get
> > it wrong - continuation passing with explicit data structures is also a
> > bit of a pain.
> ...
>

```