[erlang-questions] [ANN] Map/Reduce in Erlang and Python - Disco 0.1
Ville H Tuulos
ville.h.tuulos@REDACTED
Thu Sep 11 11:15:47 CEST 2008
Hi all,
I am happy to announce the availability of Disco (as already featured in
Reddit, Hacker News etc.), an open-source implementation of the
Map/Reduce framework for distributed computing. Its
core is written in Erlang but users typically write jobs in Python.
Find the project site at
http://discoproject.org
or see the source code right away at
http://github.com/tuulos/disco/tree/master
We at Nokia Research in Palo Alto have been using it successfully for
data mining, building probabilistic models, and full-text indexing of
hundreds of gigabytes of real-world data on hundreds of CPUs in
parallel. If you don't have a spare cluster available, we provide a
script that sets up a working cluster automatically on the Amazon's EC2
cloud.
It has been a pleasure to use Erlang to implement the job scheduler
and other core components of the system. It uses SCGI to provide a web
interface through an external web server, the slave module to start
Erlang VMs on slave nodes, and normal port commands to launch Python
workers on the nodes.
Disco is released under the BSD license. The system is still young,
there are known bugs, and there is still work to be done on scalability
issues as well. You're very welcome to try out the system, give
feedback, and develop the system with us.
I'll be at the ICFP / Erlang Workshop in Victoria, so if you're
attending I'd be happy to show a demo and have a chat with you about Disco.
Ville Tuulos
Member of Research Staff
Nokia Research Center
Palo Alto
More information about the erlang-questions
mailing list