[erlang-questions] Reading a big text file from a web
Ingela Anderton Andin
ingela@REDACTED
Thu May 3 10:19:31 CEST 2007
Chandru wrote:
> On 02/05/07, Ingela Anderton Andin <ingela@REDACTED> wrote:
>>
>> erlang-questions-request@REDACTED wrote:
>> [...]
>> >> Right now my, very naive, implementation is reading the result of
>> >> http:request("http://example.com/file.txt") and split it by new
>> lines.
>> >> The file is several megabytes long so the memory consumption
>> increases
>> >> considerably.
>> >>
>> >> What do you recommend to implement this functionality?
>> >>
>> >
>> > Check to see if the inets http client has an option to save the
>> > response to a file.
>> It does have an option to save it to a file.
>>
>> http:request(get, {URL, []}, [], [{stream, FilePath}]).
>> > Once the file is downloaded, you can then parse it in chunks.
>> >
>> Better up the inets http-client lets you stream the result to a process
>> directly so you do not have to wait for the
>> whole response to be saved to a file to then read it in chunks.
>
> I wouldn't recommend doing this for large files. Depending on the size
> of the file and the complexity of the parsing, you could end up
> overloading the process which is receiving these messages. And as the
> size of the message queue of a process increases, it becomes slower as
> I think the runtime system penalises processes with large message
> queues.
The process that gets penalized is not the processes with the long
queue but the
process sending to the process with the long queue in order to if
possible prevent
congestion.
> Message passing is a double edged sword. It is very nice, but it can
> kill your system if you don't use it wisely.
I think streaming to a process is the most intuitive solution, but
there might be a trade off here
that would make streaming to a file and then parsing the file a good
alternative. I think the best
thing is to test both and then decide.
Regards - Ingela OTP team
More information about the erlang-questions
mailing list