[erlang-questions] current xml parsers

Willem de Jong w.a.de.jong@REDACTED
Sat Mar 24 08:22:05 CET 2012


Hi Roberto,

The main reason why I wrote erlsom was not memory footprint or speed, but
the fact that I didn't understand the documentation of xmerl and I didn't
like the output format. I thought (and still think) that it is a waste to
translate a generic structure like xml to another generic structure (like a
DOM tree or something similar). You must then extract your information from
that new structure, while you could have done it in one pass. Both the SAX
parser and the "data binder" mode that erlsom offers support this, in a
way.

There has been a period when the erlsom SAX parser was a lot faster than
xmerl, but nowadays xmerl also offers a SAX parser, and I think the
difference in speed is small. Also, both erlsom and xmerl allow you to
parse in a kind of streaming way, so neither forces you to load the whole
XML document in memory, and both support working directly on binaries.

Note that, when you are looking at the performance of the parser you should
also consider the effort that is required to get the information that you
actually want from the output of the parser. If you use the erlsom "data
mapper" mode, you will get nice records that are easy to access. If you use
the SAX mode of erlsom or xmerl, you can take the information that you want
out of the stream as you go along - actually a very nice model, I think.

By the way, please note that the best place to get erlsom nowadays is
github, not sourceforge. The latest version is in
https://github.com/willemdj/erlsom

Regards,
Willem


On Fri, Mar 23, 2012 at 4:22 PM, Roberto Ostinelli <roberto@REDACTED>wrote:

> Dear list,
>
> does someone have recent considerations on xml parsers in terms of memory
> footprint, parsing speed and stability?
>
> The ones I'm aware of are xmerl, erlsom [1] and the driver used in
> ejabberd (which unfortunately is GPL).
>
> I don't care about DTD validation.
>
> Thank you,
>
> r.
>
> [1] http://erlsom.sourceforge.net/erlsom.htm
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120324/b124ba37/attachment.htm>


More information about the erlang-questions mailing list