Fix for A=<<1>>
Robert Virding
robert.virding@REDACTED
Mon May 26 22:34:53 CEST 2003
The old scanner worked in two passes, the pre_XXX functions just collect characters until the end of the form is reached, then the scan_XXX functions do the actual tokenising which is much easier when you know you all there is. This was done to make it simpler to handle the reentrant handling.
As an aside leex generates a one pass scanner which handles the reentrant collecting and tokenising in one pass which is easy in generated code.
Robert
----- Original Message -----
From: "Ulf Wiger" <etxuwig@REDACTED>
To: "James Hague" <james@REDACTED>
Cc: <erlang-questions@REDACTED>
Sent: Friday, May 02, 2003 1:47 PM
Subject: Re: Fix for A=<<1>>
>
> I quickly glanced at erl_scan.erl to see if my instinctive
> objection to your patch was correct, and it was... but
> leading to a further question/complaint:
>
> erl_scan.erl is designed to be reentrant. Thus, your code
> may not always work, since it might happen that the split
> into chunks will occur right inside "=<<".
>
> What I observed in erl_scan.erl is that this kind of
> cheating is already done when matching "<<", ">>", ">=",
> "->", etc.
>
> pre_escape/2 does things the hard (reentrant) way, but e.g.
> scan_escape/2 cheats.
>
>
> Or am I overlooking some magic code snippet that guarantees
> that there are always enough bytes in the scan buffer to
> ensure that the right function clause matches?
>
> (BTW, xmerl_scan.erl, which I wrote, suffers from the same
> problem; matching multiples in the function head is great
> for readability, but not if you want your scanner to be
> reentrant.)
>
> /Uffe
>
> On Thu, 1 May 2003, James Hague wrote:
>
> >That the start of "A=<<1>>" is incorrectly tokenized into
> >A, =<, < has always bothered me, so here's a patch for
> >erl_scan.erl that fixes it. Special casing this in the
> >scanner is a bit grotty, but it's better than having a
> >special case in the documentation.
> >
> >I'm posting this here instead of erlang-patches to see if
> >anyone can come up with a reason why this is a bad idea
> >(besides being an odd special case).
> >
> >(Apologies for the manual patch, BTW.)
> >
> >James
> >
> >
> >After:
> >
> >%% Punctuation characters and operators, first recognise multiples.
> >
> >insert:
> >
> >%% The first clause looks for "=<<" and splits it into "=","<<" so
> >%% matches like "=<<1>>" aren't tokenized as "=<","<".
> >scan1([$=,$<,$<|Cs], Toks, Pos) ->
> > scan1(Cs, [{'<<',Pos},{'=',Pos}|Toks], Pos);
> >
> >
>
> --
> Ulf Wiger, Senior Specialist,
> / / / Architecture & Design of Carrier-Class Software
> / / / Strategic Product & System Management
> / / / Ericsson AB, Connectivity and Control Nodes
>
>
More information about the erlang-questions
mailing list