[erlang-questions] Reading large (1GB+) XML files.

Fri Aug 17 08:31:24 CEST 2007

On 8/16/07, Joe Armstrong <erlang@REDACTED> wrote:
> Another question I have is:
>
>    What do you want to do with an infinite document?
>
>    (here infinite means "too big to keep the parse tree in memory in
> an efficient manner")

If I understand you correctly, you are asking about real world
scenarios where one would want to parse massive XML files in a
streaming fashion? If so, one example use case is a payments system
that I'm working on that accepts instructions in an ISO 20022 XML file
which can contain in excess of 100000 items in a single file (but
there is no hard limit). Each item might be on average 4k, but could
be up to 10k (this is the size of the uncompressed ASCII XML
encoding). Then if you have to process multiple input files
concurrently, you can't materialize the whole thing into memory. So
the approach is to parse and materialze in a streaming fashion and
send the resulting data objects off to a downstream process system.

Anyway, apologies if I misunderstood you and this comment is
irrelevant to this conversation.

HTH,

Ben