[erlang-questions] Reading a big text file from a web

Ingela Anderton Andin ingela@REDACTED
Thu May 3 10:19:31 CEST 2007


Chandru wrote:
> On 02/05/07, Ingela Anderton Andin <ingela@REDACTED> wrote:
>>
>> erlang-questions-request@REDACTED wrote:
>> [...]
>> >> Right now my, very naive, implementation is reading the result of
>> >> http:request("http://example.com/file.txt") and split it by new 
>> lines.
>> >> The file is several megabytes long so the memory consumption 
>> increases
>> >> considerably.
>> >>
>> >> What do you recommend to implement this functionality?
>> >>
>> >
>> > Check to see if the inets http client has an option to save the
>> > response to a file.
>> It does have an option to save  it to a file.
>>
>> http:request(get, {URL, []}, [], [{stream, FilePath}]).
>> > Once the file is downloaded, you can then parse it in chunks.
>> >
>> Better up the inets http-client lets you stream the result to a process
>> directly so you do not have to wait for the
>> whole response to be saved to a file to then read it in chunks.
>
> I wouldn't recommend doing this for large files. Depending on the size
> of the file and the complexity of the parsing, you could end up
> overloading the process which is receiving these messages. And as the
> size of the message queue of a process increases, it becomes slower as
> I think the runtime system penalises processes with large message
> queues. 
The  process that gets penalized  is not the processes with the long 
queue but the
process sending to the process with the long queue in order to if 
possible prevent
congestion.
> Message passing is a double edged sword. It is very nice, but it can
> kill your system if you don't use it wisely.
I think  streaming to a process is the most intuitive solution, but 
there might be a trade off here
that would make streaming to a file and then parsing the file a good 
alternative. I think the best
thing is to test both and then decide.

Regards - Ingela OTP team





More information about the erlang-questions mailing list