Parsing binaries (was Re: Concatenating atoms)

Peter-Henry Mander erlang@REDACTED
Fri Feb 4 08:55:58 CET 2005

Hi Matthias, hi Joe,

If integers are desirable, why not use indexes into the binary which
contains the original XML data?

So, to use Joe's example to illustrate, the variable Abc on line 23 of
the program could be represented as {var,Offset,Size} where Offset is
the position of the first byte of "Abc" in XML_Binary, and Size in this
case would be 3 (obtaining line numbers would be done by scanning the
binary for newline chars, for error reports). I think I'm correct in
saying that the following:

<<_:Offset/binary,Chunk:Size/binary,_/binary>> = XML_Binary,

Doesn't create a new binary Chunk, instead creates a reference into the
existing XML_binary.

A lexicon of tokens could be based on a list of {Offset,Size} tuples,
and all matching tokens in a parse tree can refer to the first occurance
of the token in XML_Binary.

I'm currently using this technique to parse SIP and SDP. There's at
least a x3 speed advantage when compiling with HiPE too! (Although Joe
seems less concerned about that sort of detail :-)


On Thu, 03 Feb 2005 22:07:39 +0100
Matthias Kretschmer <mccratch@REDACTED> wrote:

> Well from a practical view it might be very unimportant, because 
> comparing two or three machine words would be sufficient. But if one 
> wants to use it for tokens in a compiler, this might not be the case. On 
> the other hand looking at some of my code I hardly find many atoms which 
> are sharing a common long prefix and have the same length (though don't 
> know how atoms are compared, but I could think that first type and 
> length is tested and then from left to right). I am just using Erlang 
> for small unimportant private projects, so my experience is very limited 
> (and programming was nothing more for me besides university). Maybe 
> someone with experience (of bigger projects) may enlighten me?

"The Tao of Programming
 flows far away 
 and returns 
 on the wind of morning."

More information about the erlang-questions mailing list