[erlang-questions] Parallel list proccessing once again
Clint Moore
hydo@REDACTED
Wed Oct 8 17:01:00 CEST 2008
Before I try to write it, I thought I would ask and see if there
isn't already something that does this. What I want to do is connect
up a bunch of nodes and throw arbitrary functions and data to them and
get the responses back in an easily manageable way _without_
overloading the nodes. This module would take care of checking for
smp support on the individual nodes to see if it can throw more than
one block to it or not, take care of waiting for all of the responses,
and resend any blocks that didn't come back.
Yea, I know... it's not a small thing but it seems like it would be
incredibly useful. The data set I'm working off of is ~28k twitter
messages in the form of:
[[ { service, "twitter" }, { user, "name" }, { update, "text"},
{ updated, date }, ... ]] called Dx.
Let's suppose I have a function that takes a list of words and checks
for @some_word and, if it finds it, replacing it with <a
href="http://...">@some_word</a> which makes a clickable link to that
user for some web-based twitter interface. We'll call this function Fx.
Here's what I want to do:
(node@REDACTED) 1> ResultSet = awesome_module:distributed_map( Fx, Dx ).
...
(node@REDACTED) 2>
I could have 100 or no other nodes connected, it wouldn't matter
because the module would put as much load on a node as it could handle
(A lagom number of processes, I believe?) based off of
erlang:system_info(smp_support) or whatever else.
Is there anything like this already or is my newbishness with Erlang
making the problem sound much more than it actually is?
More information about the erlang-questions
mailing list