Parsing binaries (was Re: Concatenating atoms)
Joe Armstrong (AL/EAB)
joe.armstrong@REDACTED
Mon Feb 7 11:26:44 CET 2005
>
> Hi Matthias, hi Joe,
>
> If integers are desirable, why not use indexes into the binary which
> contains the original XML data?
Absolutly - this is a good idea - It just make the parseing a wee bit
more complicated. (Actually you can use strings, - but don't tell anybody I said so -
if you can guarantee that they are "pointer identical" (ie you build the strings
in linear manner using a "pure dictionary type" library)
> So, to use Joe's example to illustrate, the variable Abc on line 23 of
> the program could be represented as {var,Offset,Size} where Offset is
> the position of the first byte of "Abc" in XML_Binary, and
> Size in this
> case would be 3 (obtaining line numbers would be done by scanning the
> binary for newline chars, for error reports). I think I'm correct in
> saying that the following:
>
> <<_:Offset/binary,Chunk:Size/binary,_/binary>> = XML_Binary,
>
> Doesn't create a new binary Chunk, instead creates a
> reference into the
> existing XML_binary.
>
> A lexicon of tokens could be based on a list of {Offset,Size} tuples,
> and all matching tokens in a parse tree can refer to the
> first occurance
> of the token in XML_Binary.
>
> I'm currently using this technique to parse SIP and SDP. There's at
> least a x3 speed advantage when compiling with HiPE too! (Although Joe
> seems less concerned about that sort of detail :-)
>
UUuuuuuuuuuuummmmmmmmmmm :-)
/Joe
> Pete.
>
> On Thu, 03 Feb 2005 22:07:39 +0100
> Matthias Kretschmer <mccratch@REDACTED> wrote:
>
> >
> > Well from a practical view it might be very unimportant, because
> > comparing two or three machine words would be sufficient.
> But if one
> > wants to use it for tokens in a compiler, this might not be
> the case. On
> > the other hand looking at some of my code I hardly find
> many atoms which
> > are sharing a common long prefix and have the same length
> (though don't
> > know how atoms are compared, but I could think that first type and
> > length is tested and then from left to right). I am just
> using Erlang
> > for small unimportant private projects, so my experience is
> very limited
> > (and programming was nothing more for me besides university). Maybe
> > someone with experience (of bigger projects) may enlighten me?
>
>
> --
> "The Tao of Programming
> flows far away
> and returns
> on the wind of morning."
>
>
More information about the erlang-questions
mailing list