Best practice for Erlang's process design to build a website-downloader (super simple crawler)

I Gusti Ngurah Oka Prinarjaya okaprinarjaya@REDACTED
Mon Nov 11 02:26:15 CET 2019


Hi Grzegorz,

Thank you for your suggestion.

>> I would not suggest to create a new process for each URL unless you can
be sure it doesn't leant to an explosion of processes
Thanks for reminding me



Pada tanggal Sen, 11 Nov 2019 pukul 00.12 Grzegorz Junka <list1@REDACTED>
menulis:

> Hi Gusti,
>
> I would suggest to create a pool of N processes and a queue of URLs to
> process. Every time a new URL is encountered it's added to the queue. Then
> a scheduler would pick up those URLs and distribute them across the pool of
> processes. I would not suggest to create a new process for each URL unless
> you can be sure it doesn't leant to an explosion of processes, i.e. that
> the number of URLs is limited.
>
> Greg
>
>
> On 10/11/2019 10:07, I Gusti Ngurah Oka Prinarjaya wrote:
>
> Hi,
>
> Anyone?
>
>
>
> Pada tanggal Sab, 9 Nov 2019 pukul 19.43 I Gusti Ngurah Oka Prinarjaya <
> okaprinarjaya@REDACTED> menulis:
>
>> Hi,
>>
>> I need to know the best practice for Erlang's process design to become a
>> website downloader. I don't need heavy parsing the website like what a
>> scrapper do. Maybe i only need to parse url's <a href=".." /> .
>>
>> What had just come to my mind was create N number of Erlang's process
>> under a supervisor. N is number of url <a href="..." /> found in a
>> website's pages. But i'm not sure that's a good design. So i need
>> recommendations from you who have experience on it.
>>
>> Thank you, I appreciate all of  your time and attention
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191111/2a271d84/attachment.htm>


More information about the erlang-questions mailing list