[erlang-questions] Reading large (1GB+) XML files.
Dustin Sallings
dustin@REDACTED
Wed Aug 15 22:15:39 CEST 2007
On Aug 15, 2007, at 11:23 , Patrik Husfloen wrote:
> I'm not sure about the terminology here, I've been stuck in OO land
> for so long that everything looks like an object, but here's what I'm
> thinking: One thread reading the xmls and piecing together the data,
> and then handing off each record to a pool of workers that issue the
> http requests, or, maybe the xml-reading part could just spawn a new
> thread for each record it reads, and ensure that only X are running at
> the most?
This sounds very similar to the design of my load replay tool. I've
got a tool that reads a pcap file and writes out a binary file that I
suppose is conceptually similar to XML. The playback tool reads that
file and issues HTTP requests with the same types of payload (some
contents rewritten for validity on playback) with the same timings
(to whatever scale is desirable) and logs the results. It works like
this:
1) There's an overseer process that starts all of the other
processes and facilitates communication among them.
2) One process is responsible for reading the file, sleeping as
appropriate, and sending records up to the overseer.
3) Another process is responsible for performing HTTP requests. It
receives the messages from the overseer, issues an async http request
against inets, and adds the result to a dict with a timer. When a
response comes back from inets, it looks up the request and sends the
timing, request, and results back up.
4) The logging process figures out what the request meant, on
behalf of what user it was sent, and some other stuff and logs it.
On startup, I find all available nodes and run one of the requestor
processes (#3) on each node. The overseer has a queue of these
processes and pops the next available requestor off the front, sends
it a request, and adds it to the back of the queue again.
If you want to control how many concurrent requests you're
executing, you can issue the requests synchronously and use a process
queue like I've got there.
--
Dustin Sallings
More information about the erlang-questions
mailing list