Parsing infinite streams style
Joe Armstrong
<>
Tue Mar 9 11:16:27 CET 2004
> I think I understand why it works, but need a few more iterations to
> actually think that way - it's like the difference between listening
> to an In-Flight Swedish CD on the plane to Stockholm vs. being able
> converse in the language on arrival.
It's easy to understand it you think about the principles involved
and don't stare blindly at the code.
Assume f is a function which parses X - to evoke the parser we call f(X)
now there are two cases:
- we got enough data to parse. We do the parse the result is R
and some data left over X1. So in this case
f(X) => {ok, R, X1}
- we didn't get enough data. So we return
f(X) => {more, F1}
and the *next* time we get data X we call F1(X)
Restarting the parse at the "top" is just equivalent to writing the parse
function like this:
f(X) ->
case is_there_enough_data_to_do_the_parse(X) of
true ->
do_the_parse(X);
false ->
{more, fun(Z) -> X ++ Z end}
end.
<<aside1 I actually returned {done,Result,NewParserFunction} not
{ok, R, X1} - but the idea is the same>>
<<aside2: It's not a good idea to try and think about how mutually
recursive definitions work (it just makes you head hurt) - think
base case, recursive case. Then prove that the recursive case
make the problem smaller.
I learnt this in school, but it was called something different.
I learnt this:
sin(2A) = 2sin(A)cos(A)
cos(2A) = 1 - 2sin^(X)
sin(0) = 0
cos(0) = 1
So to compute sin(X) we have to compute sin(X/2) and cos(X/2)
and to compute cos(X) we have to compute sin(X/2)
this is the recursive case where the argument gets smaller.
The base case is sin(0) = 0 or cos(0) = 1 (actually it's a small
angle not zero)
So this recursive case/base case and reducing argument pattern
is very well known :-) - it's they didn't tell me in school that
this pattern of equations was a computer program :-)
>>
/Joe
>
> > but this can be broken into arbitrary chucks - we can write a
> > reentrant parser like this:
> >
> > get_begin("begin " ++ T) -> get_int(T, 0);
> > get_begin(S) -> {more, fun(Data) -> get_begin(S ++ D) end}.
> >
> > get_int(" " ++ T, N) -> get_end(T, N);
> > get_int([H|T], N) -> get_int(T, N*10-H-$0);
> > get_int([], N) -> {more, fun(S) -> get_int(S, N)}.
> >
> > get_end("end" ++ T, N) -> {done, N, fun(I) -> get_begin(I)};
> > get_end(T, N) -> {more, fun(S) -> get_end(T++S, N) end}.
> >
> > This way avoids having to restart the parsing at the beginning
> > every time.
> >
> > Having said all of this I'd be rather careful - the "continuation
> > passing with funs" method is elegant BUT a bugger to debug if you get
> > it wrong - continuation passing with explicit data structures is also a
> > bit of a pain.
> ...
>
More information about the erlang-questions
mailing list