[erlang-questions] XML parser that works on binaries

Kenneth Lundin <>
Sat Nov 24 00:07:54 CET 2007


Hi Joel,

It depends on what you mean with binaries and what you want to achieve.
It is easy and already today possible to read the input as binary chunks
convert each chunk to a list and then parse that list with XMERL.
This way you will reduce the footprint compared with reading the
complete input convert to list and then parse.
Another thing is if you want the parsed text content to be kept as
binaries or as lists. If you want to keep the text as binaries it
would require quite a lot of changes to xmerl. The reason for wanting
to keep the text as binaries would again be memory footprint.
What use case do you have where the XML data is so big so thhe memory
footprint makes a significant difference between binaries and lists.

Of course I am aware of that a byte in a binary occupies 4 bytes when
converted to
a lists in a 32 bit Erlang VM and 8 bytes in an 64 bit Erlang VM but
if parsing is done in chunks the memory footprint can be controlled
anyway.
And note that you have little or no reason to run a 64-bit Erlang VM
even if the OS is running in 64 bit mode. Only if you need to address
more than 2 Gbyte of ram in the Erlang VM you need to run a 64 bit VM.

I think we need to provide a faster XML parser in the standard distribution and
also more compact output formats than the XMERL default. This will
hopefully and very probably happen during 2008 and very possibly as
improvements and extensions to the XMERL application.

/Kenneth Erlang/OTP team at Ericsson



On 11/23/07, Joel Reymont <> wrote:
> Has anyone extracted the expat driver code from ejabberd?
>
> Is there another XML parser that works on binaries?
>
> Would hacking XMERL to work on binaries be a good idea?
>
>        Thanks, Joel
>
> --
> http://wagerlabs.com
>
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://www.erlang.org/mailman/listinfo/erlang-questions
>



More information about the erlang-questions mailing list