[erlang-questions] Parallel list proccessing once again

Wed Oct 8 20:21:39 CEST 2008

On Wed, Oct 8, 2008 at 10:01 AM, Clint Moore <hydo@REDACTED> wrote:
>
>        Before I try to write it, I thought I would ask and see if there
> isn't already something that does this.  What I want to do is connect
> up a bunch of nodes and throw arbitrary functions and data to them and
> get the responses back in an easily manageable way _without_
> overloading the nodes.  This module would take care of checking for
> smp support on the individual nodes to see if it can throw more than
> one block to it or not, take care of waiting for all of the responses,
> and resend any blocks that didn't come back.
>
>        Yea, I know... it's not a small thing but it seems like it would be
> incredibly useful.  The data set I'm working off of is ~28k twitter
> messages in the form of:
>
> [[ { service, "twitter" }, { user, "name" }, { update, "text"},
> { updated, date }, ... ]]  called Dx.
>
> Let's suppose I have a function that takes a list of words and checks
> for @some_word and, if it finds it, replacing it with <a
> href="http://...">@some_word</a> which makes a clickable link to that
> user for some web-based twitter interface.  We'll call this function Fx.
>
> Here's what I want to do:
>
> (node@REDACTED) 1> ResultSet = awesome_module:distributed_map( Fx, Dx ).
> ...
> (node@REDACTED) 2>
>
> I could have 100 or no other nodes connected, it wouldn't matter
> because the module would put as much load on a node as it could handle
> (A lagom number of processes, I believe?) based off of
> erlang:system_info(smp_support) or whatever else.
>
> Is there anything like this already or is my newbishness with Erlang
> making the problem sound much more than it actually is?
>

To start with take a look at the pool module in the stdlib.

Another thing to consider is the cost of sending the data and the results
between the nodes compared to the actual amount of work done. From
Your description it sounds like the work done on each message is not very
much.
Sometimes it is better to have the data distributed and then just sending the
functions for doing the work to where the data is.

/Anders

>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>