Network benchmark - header parsing

Peter-Henry Mander erlang@REDACTED
Mon Dec 15 09:06:40 CET 2003


Someone mention SIP parsing using bit-syntax?

I'm currently writing a SIP message parser using pattern matching on lists. I was wondering if anyone would be kind enough to share their thoughts and experience about using binaries instead. How would I handle syntax such as:

Timestamp = "Timestamp" HCOLON 1*( DIGIT ) ["." *( DIGIT )] [LWS delay]
delay = *( DIGIT ) ["." *( DIGIT )]
LWS = [*WSP CRLF] 1*WSP

Doing this in list pattern matching is easy. I'm concerned that if I pattern-matched a binary, I would end up making many copies of the binary.

Or am I imagining false problems which don't exist?

I notice that the Megaco stack uses a link-in C driver that is generated using lex. Would this be the most effective way of parsing SIP, instead on parsing a list?

Pete.





On Sun, 14 Dec 2003 18:17:46 +0100 (CET)
Miguel Barreiro <enano@REDACTED> wrote:

> 
> > We are certainly interested in parsing headers efficiently. SIP
> > headers should permit the same sort of optimization that's done with
> > {packet,http} - in fact it doesn't look too hard to make something
> > general in which the header names could be loaded after startup.
> >
> > What sort of header munging is the slow part for your inbound
> > processing?
> 
> MPEG PES headers are binary and clearly intended to be processed in
> hardware. As a short introduction, a PS ("program stream" - what you store
> on a DVD, for instance) is a sequence of "packs". Each "pack" may contain
> an almost arbitrary number of PES ("packetized elementary streams")
> packets. The first packet in any pack may be a system header. Several
> important values (system clock references) are stored on pack headers. The
> catch is that pack headers contain no length information: you have to
> read each packet (of variable length), parse it, and check whether you
> have reached the end of the pack. To make things worse, clocks are stored
> in a fashion as "most significant part in multiples of 900KHz split into
> three parts: 3 bits here, 15 bits there, 15 bits there... least
> significant part in multiples of 27MHz, up to 22 bits, over there... and
> finally a variable amount of stuffing before actual video/audio payload".
> Smoke at ISO meetings must be really funny.
> 
> It's not the same as parsing HTTP or SIP headers.
> 
> > I am favorably impressed with bit syntax performance, to put it
> > mildly. But, reflecting on recent mention on this list about regexes,
> > would a set of regexes-for-binaries ops would help speed things up?
> 
> In this case, it would save a fair number of matchings. It might be
> worthwhile or not depending on relative efficiency to "normal"
> bitsyntax matching.
> 
> 


-- 
"The Tao of Programming
 flows far away 
 and returns 
 on the wind of morning."




More information about the erlang-questions mailing list