[erlang-questions] Parallel list proccessing once again

Clint Moore hydo@REDACTED
Wed Oct 8 17:01:00 CEST 2008


	Before I try to write it, I thought I would ask and see if there  
isn't already something that does this.  What I want to do is connect  
up a bunch of nodes and throw arbitrary functions and data to them and  
get the responses back in an easily manageable way _without_  
overloading the nodes.  This module would take care of checking for  
smp support on the individual nodes to see if it can throw more than  
one block to it or not, take care of waiting for all of the responses,  
and resend any blocks that didn't come back.

	Yea, I know... it's not a small thing but it seems like it would be  
incredibly useful.  The data set I'm working off of is ~28k twitter  
messages in the form of:

[[ { service, "twitter" }, { user, "name" }, { update, "text"},  
{ updated, date }, ... ]]  called Dx.

Let's suppose I have a function that takes a list of words and checks  
for @some_word and, if it finds it, replacing it with <a  
href="http://...">@some_word</a> which makes a clickable link to that  
user for some web-based twitter interface.  We'll call this function Fx.

Here's what I want to do:

(node@REDACTED) 1> ResultSet = awesome_module:distributed_map( Fx, Dx ).
...
(node@REDACTED) 2>

I could have 100 or no other nodes connected, it wouldn't matter  
because the module would put as much load on a node as it could handle  
(A lagom number of processes, I believe?) based off of  
erlang:system_info(smp_support) or whatever else.

Is there anything like this already or is my newbishness with Erlang  
making the problem sound much more than it actually is?





More information about the erlang-questions mailing list