"Fast" text parsing

Mon Dec 10 09:23:19 CET 2001

On Mon, 10 Dec 2001, Shawn Pearce wrote:

Shawn> Is the trick to use binaries?  Or is there no trick, just that Erlang
Shawn> text processing in general is slower than what can be constructed in
Shawn> lower level languages such as C++?
Shawn> 
Shawn> I guess I'm really interested in Erlang for two reasons:  one, I can
Shawn> quickly make the tool distributed and take advantage of many spare
Shawn> CPU cycles on other nodes to perform parsing, two it has really nice
Shawn> pattern matching on function heads, especially with records, which may
Shawn> be helpful for working with the abstract syntax trees I need to deal with.
Shawn> 
Shawn> Anyone have experience with building "fast" parsers???  I'd love to hear
Shawn> some suggestions...

In the Megaco application I started with a handwritten scanner in
Erlang, using a long list of integers (an "Erlang String") as input.
The output was a list of tokens suitable for the yecc generated parser.

By rewriting the scanner in C using using the scanner generator flex,
and linking it into Erlang as a driver, I got up to 10 times better
performance of the scanner. The new scanner used binaries as input
and allocated the list of tokens directly on the callers heap.

The two scanners can be found at:

    megaco/src/text/megaco_text_scanner.erl

and

    megaco/src/flex/megaco_flex_scanner.erl
    megaco/src/flex/megaco_flex_scanner_drv.flex

/Håkan

---
Håkan Mattsson
Ericsson
Computer Science Laboratory
http://www.ericsson.com/cslab/~hakan/