Best practice for Erlang's process design to build a website-downloader (super simple crawler)
Grzegorz Junka
list1@REDACTED
Sun Nov 10 18:12:25 CET 2019
Hi Gusti,
I would suggest to create a pool of N processes and a queue of URLs to
process. Every time a new URL is encountered it's added to the queue.
Then a scheduler would pick up those URLs and distribute them across the
pool of processes. I would not suggest to create a new process for each
URL unless you can be sure it doesn't leant to an explosion of
processes, i.e. that the number of URLs is limited.
Greg
On 10/11/2019 10:07, I Gusti Ngurah Oka Prinarjaya wrote:
> Hi,
>
> Anyone?
>
>
>
> Pada tanggal Sab, 9 Nov 2019 pukul 19.43 I Gusti Ngurah Oka Prinarjaya
> <okaprinarjaya@REDACTED <mailto:okaprinarjaya@REDACTED>> menulis:
>
> Hi,
>
> I need to know the best practice for Erlang's process design to
> become a website downloader. I don't need heavy parsing the
> website like what a scrapper do. Maybe i only need to parse url's
> <a href=".." /> .
>
> What had just come to my mind was create N number of Erlang's
> process under a supervisor. N is number of url <a href="..." />
> found in a website's pages. But i'm not sure that's a good design.
> So i need recommendations from you who have experience on it.
>
> Thank you, I appreciate all of your time and attention
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20191110/3612b882/attachment.htm>
More information about the erlang-questions
mailing list