[erlang-questions] Reading large (1GB+) XML files.

Wed Aug 15 20:23:04 CEST 2007

I've been trying to learn erlang for a while, and I recently found
what I thought to be an easy starter project. I currently have a
simple application that reads data from a couple of Xml files using
SAX, and inserts it using a rpc over http.

I'm not sure about the terminology here, I've been stuck in OO land
for so long that everything looks like an object, but here's what I'm
thinking: One thread reading the xmls and piecing together the data,
and then handing off each record to a pool of workers that issue the
http requests, or, maybe the xml-reading part could just spawn a new
thread for each record it reads, and ensure that only X are running at
the most?

The http request was easy enough to get working, but I'm having
trouble with reading the xml, I used xmerl_scan:file to parse the
file, but that loads the file into memory before starting to process.

I took a look at Erlsom, and it's SAX reader examples, but that read
the entire file into a binary before passing it off to the Xml reader.

Thanks,

Patrik