Best practice for Erlang's process design to build a website-downloader (super simple crawler)

Grzegorz Junka list1@REDACTED
Sun Nov 10 18:12:25 CET 2019


Hi Gusti,

I would suggest to create a pool of N processes and a queue of URLs to 
process. Every time a new URL is encountered it's added to the queue. 
Then a scheduler would pick up those URLs and distribute them across the 
pool of processes. I would not suggest to create a new process for each 
URL unless you can be sure it doesn't leant to an explosion of 
processes, i.e. that the number of URLs is limited.

Greg


On 10/11/2019 10:07, I Gusti Ngurah Oka Prinarjaya wrote:
> Hi,
>
> Anyone?
>
>
>
> Pada tanggal Sab, 9 Nov 2019 pukul 19.43 I Gusti Ngurah Oka Prinarjaya 
> <okaprinarjaya@REDACTED <mailto:okaprinarjaya@REDACTED>> menulis:
>
>     Hi,
>
>     I need to know the best practice for Erlang's process design to
>     become a website downloader. I don't need heavy parsing the
>     website like what a scrapper do. Maybe i only need to parse url's
>     <a href=".." /> .
>
>     What had just come to my mind was create N number of Erlang's
>     process under a supervisor. N is number of url <a href="..." />
>     found in a website's pages. But i'm not sure that's a good design.
>     So i need recommendations from you who have experience on it.
>
>     Thank you, I appreciate all of  your time and attention
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191110/3612b882/attachment.htm>


More information about the erlang-questions mailing list