[erlang-questions] [ANN] Map/Reduce in Erlang and Python - Disco 0.1

Ville H Tuulos ville.h.tuulos@REDACTED
Thu Sep 11 11:15:47 CEST 2008


Hi all,

I am happy to announce the availability of Disco (as already featured in
Reddit, Hacker News etc.), an open-source implementation of the
Map/Reduce framework for distributed computing. Its
core is written in Erlang but users typically write jobs in Python.

Find the project site at

http://discoproject.org

or see the source code right away at

http://github.com/tuulos/disco/tree/master

We at Nokia Research in Palo Alto have been using it successfully for
data mining, building probabilistic models, and full-text indexing of
hundreds of gigabytes of real-world data on hundreds of CPUs in
parallel. If you don't have a spare cluster available, we provide a
script that sets up a working cluster automatically on the Amazon's EC2
cloud.

It has been a pleasure to use Erlang to implement the job scheduler
and other core components of the system. It uses SCGI to provide a web
interface through an external web server, the slave module to start
Erlang VMs on slave nodes, and normal port commands to launch Python
workers on the nodes.

Disco is released under the BSD license. The system is still young,
there are known bugs, and there is still work to be done on scalability
issues as well. You're very welcome to try out the system, give
feedback, and develop the system with us.

I'll be at the ICFP / Erlang Workshop in Victoria, so if you're 
attending I'd be happy to show a demo and have a chat with you about Disco.


Ville Tuulos
Member of Research Staff
Nokia Research Center
Palo Alto






More information about the erlang-questions mailing list