Network benchmark - header parsing
Peter-Henry Mander
erlang@REDACTED
Mon Dec 15 09:06:40 CET 2003
Someone mention SIP parsing using bit-syntax?
I'm currently writing a SIP message parser using pattern matching on lists. I was wondering if anyone would be kind enough to share their thoughts and experience about using binaries instead. How would I handle syntax such as:
Timestamp = "Timestamp" HCOLON 1*( DIGIT ) ["." *( DIGIT )] [LWS delay]
delay = *( DIGIT ) ["." *( DIGIT )]
LWS = [*WSP CRLF] 1*WSP
Doing this in list pattern matching is easy. I'm concerned that if I pattern-matched a binary, I would end up making many copies of the binary.
Or am I imagining false problems which don't exist?
I notice that the Megaco stack uses a link-in C driver that is generated using lex. Would this be the most effective way of parsing SIP, instead on parsing a list?
Pete.
On Sun, 14 Dec 2003 18:17:46 +0100 (CET)
Miguel Barreiro <enano@REDACTED> wrote:
>
> > We are certainly interested in parsing headers efficiently. SIP
> > headers should permit the same sort of optimization that's done with
> > {packet,http} - in fact it doesn't look too hard to make something
> > general in which the header names could be loaded after startup.
> >
> > What sort of header munging is the slow part for your inbound
> > processing?
>
> MPEG PES headers are binary and clearly intended to be processed in
> hardware. As a short introduction, a PS ("program stream" - what you store
> on a DVD, for instance) is a sequence of "packs". Each "pack" may contain
> an almost arbitrary number of PES ("packetized elementary streams")
> packets. The first packet in any pack may be a system header. Several
> important values (system clock references) are stored on pack headers. The
> catch is that pack headers contain no length information: you have to
> read each packet (of variable length), parse it, and check whether you
> have reached the end of the pack. To make things worse, clocks are stored
> in a fashion as "most significant part in multiples of 900KHz split into
> three parts: 3 bits here, 15 bits there, 15 bits there... least
> significant part in multiples of 27MHz, up to 22 bits, over there... and
> finally a variable amount of stuffing before actual video/audio payload".
> Smoke at ISO meetings must be really funny.
>
> It's not the same as parsing HTTP or SIP headers.
>
> > I am favorably impressed with bit syntax performance, to put it
> > mildly. But, reflecting on recent mention on this list about regexes,
> > would a set of regexes-for-binaries ops would help speed things up?
>
> In this case, it would save a fair number of matchings. It might be
> worthwhile or not depending on relative efficiency to "normal"
> bitsyntax matching.
>
>
--
"The Tao of Programming
flows far away
and returns
on the wind of morning."
More information about the erlang-questions
mailing list