[erlang-questions] Reading a big text file from a web

Ingela Anderton Andin ingela@REDACTED
Thu May 3 10:19:31 CEST 2007

Chandru wrote:
> On 02/05/07, Ingela Anderton Andin <ingela@REDACTED> wrote:
>> erlang-questions-request@REDACTED wrote:
>> [...]
>> >> Right now my, very naive, implementation is reading the result of
>> >> http:request("http://example.com/file.txt") and split it by new 
>> lines.
>> >> The file is several megabytes long so the memory consumption 
>> increases
>> >> considerably.
>> >>
>> >> What do you recommend to implement this functionality?
>> >>
>> >
>> > Check to see if the inets http client has an option to save the
>> > response to a file.
>> It does have an option to save  it to a file.
>> http:request(get, {URL, []}, [], [{stream, FilePath}]).
>> > Once the file is downloaded, you can then parse it in chunks.
>> >
>> Better up the inets http-client lets you stream the result to a process
>> directly so you do not have to wait for the
>> whole response to be saved to a file to then read it in chunks.
> I wouldn't recommend doing this for large files. Depending on the size
> of the file and the complexity of the parsing, you could end up
> overloading the process which is receiving these messages. And as the
> size of the message queue of a process increases, it becomes slower as
> I think the runtime system penalises processes with large message
> queues. 
The  process that gets penalized  is not the processes with the long 
queue but the
process sending to the process with the long queue in order to if 
possible prevent
> Message passing is a double edged sword. It is very nice, but it can
> kill your system if you don't use it wisely.
I think  streaming to a process is the most intuitive solution, but 
there might be a trade off here
that would make streaming to a file and then parsing the file a good 
alternative. I think the best
thing is to test both and then decide.

Regards - Ingela OTP team

More information about the erlang-questions mailing list