Parsing binaries (was Re: Concatenating atoms)

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Mon Feb 7 11:26:44 CET 2005


> 
> Hi Matthias, hi Joe,
> 
> If integers are desirable, why not use indexes into the binary which
> contains the original XML data?

Absolutly - this is a good idea - It just make the parseing a wee bit
more complicated. (Actually you can use strings, - but don't tell anybody I said so -
if you can guarantee that they are "pointer identical" (ie you build the strings
in linear manner using a "pure dictionary type" library)

 
> So, to use Joe's example to illustrate, the variable Abc on line 23 of
> the program could be represented as {var,Offset,Size} where Offset is
> the position of the first byte of "Abc" in XML_Binary, and 
> Size in this
> case would be 3 (obtaining line numbers would be done by scanning the
> binary for newline chars, for error reports). I think I'm correct in
> saying that the following:
> 
> <<_:Offset/binary,Chunk:Size/binary,_/binary>> = XML_Binary,
> 
> Doesn't create a new binary Chunk, instead creates a 
> reference into the
> existing XML_binary.
> 
> A lexicon of tokens could be based on a list of {Offset,Size} tuples,
> and all matching tokens in a parse tree can refer to the 
> first occurance
> of the token in XML_Binary.
> 
> I'm currently using this technique to parse SIP and SDP. There's at
> least a x3 speed advantage when compiling with HiPE too! (Although Joe
> seems less concerned about that sort of detail :-)
> 

UUuuuuuuuuuuummmmmmmmmmm :-)

/Joe


> Pete.
> 
> On Thu, 03 Feb 2005 22:07:39 +0100
> Matthias Kretschmer <mccratch@REDACTED> wrote:
> 
> > 
> > Well from a practical view it might be very unimportant, because 
> > comparing two or three machine words would be sufficient. 
> But if one 
> > wants to use it for tokens in a compiler, this might not be 
> the case. On 
> > the other hand looking at some of my code I hardly find 
> many atoms which 
> > are sharing a common long prefix and have the same length 
> (though don't 
> > know how atoms are compared, but I could think that first type and 
> > length is tested and then from left to right). I am just 
> using Erlang 
> > for small unimportant private projects, so my experience is 
> very limited 
> > (and programming was nothing more for me besides university). Maybe 
> > someone with experience (of bigger projects) may enlighten me?
> 
> 
> -- 
> "The Tao of Programming
>  flows far away 
>  and returns 
>  on the wind of morning."
> 
> 



More information about the erlang-questions mailing list