[erlang-questions] XML parser that works on binaries

Willem de Jong w.a.de.jong@REDACTED
Fri Nov 23 19:41:08 CET 2007

The new version of erlsom (released a couple of days ago on sourceforge) can
parse input in chunks, one chunk at a time. That way the problem with the
memory footprint can be solved.
Files of arbitrary size or a stream of data can be parsed.

The approach is similar to the approach that xmerl uses, with a 'fetch'
hook, as Ulf calls it.

An example how to do this for big files is included with the erlsom
distribution. It should be fairly straightforward (even for UTF-8 or UTF-16
encoded data).


On Nov 23, 2007 6:09 PM, Joel Reymont <joelr1@REDACTED> wrote:

> On Nov 23, 2007, at 4:44 PM, Willem de Jong wrote:
> > Why do you want a parser that works on binaries?
> To minimize the memory footprint of an application, for example.
> ejabberd was a huge memory hog a year and a half ago since XML
> binaries received from the socket were converted to lists for
> processing. I was an even bigger memory hog on a x64 system for
> obvious reasons. I noticed that even then they were using an expat
> driver that could operate on binaries so they may have completed that
> conversion now.
> --
> http://wagerlabs.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071123/0c9fa84d/attachment.htm>

More information about the erlang-questions mailing list