[erlang-questions] [ANN] Map/Reduce in Erlang and Python - Disco 0.1

Ville H Tuulos ville.h.tuulos@REDACTED
Thu Sep 11 20:00:33 CEST 2008


ext Bob Ippolito wrote:
> Why the choice of SCGI? It seems like it would be a lot simpler not to
> require an external web server. Erlang does HTTP on its own pretty
> well, in my experience :)

Thanks Bob for the comment (: I have used Mochiweb quite happily(*) in 
another Erlang project of mine. There are people already working on 
Disco to make it work with Mochiweb.

The web server is mainly used to serve large files to workers; SCGI is 
just used to forward control requests to the Erlang process. Using an 
external web server for IO intensive jobs was a safe choice in the first 
place. I'd be happy to replace it with, say, Mochiweb, if it handle the 
load.


Ville

(*) I think I had a problem with Mochiweb: It uses the raw mode for 
sockets, and it seems that there's a 16M limit for gen_tcp:recv() in 
that case:

http://www.erlang.org/pipermail/erlang-questions/2006-September/022907.html

If I remember correctly, this caused any HTTP POST requests larger than 
16M fail. Please correct me if I'm wrong.


> 
> On Thu, Sep 11, 2008 at 2:15 AM, Ville H Tuulos
> <ville.h.tuulos@REDACTED> wrote:
>> Hi all,
>>
>> I am happy to announce the availability of Disco (as already featured in
>> Reddit, Hacker News etc.), an open-source implementation of the
>> Map/Reduce framework for distributed computing. Its
>> core is written in Erlang but users typically write jobs in Python.
>>
>> Find the project site at
>>
>> http://discoproject.org
>>
>> or see the source code right away at
>>
>> http://github.com/tuulos/disco/tree/master
>>
>> We at Nokia Research in Palo Alto have been using it successfully for
>> data mining, building probabilistic models, and full-text indexing of
>> hundreds of gigabytes of real-world data on hundreds of CPUs in
>> parallel. If you don't have a spare cluster available, we provide a
>> script that sets up a working cluster automatically on the Amazon's EC2
>> cloud.
>>
>> It has been a pleasure to use Erlang to implement the job scheduler
>> and other core components of the system. It uses SCGI to provide a web
>> interface through an external web server, the slave module to start
>> Erlang VMs on slave nodes, and normal port commands to launch Python
>> workers on the nodes.
>>
>> Disco is released under the BSD license. The system is still young,
>> there are known bugs, and there is still work to be done on scalability
>> issues as well. You're very welcome to try out the system, give
>> feedback, and develop the system with us.
>>
>> I'll be at the ICFP / Erlang Workshop in Victoria, so if you're
>> attending I'd be happy to show a demo and have a chat with you about Disco.
>>
>>
>> Ville Tuulos
>> Member of Research Staff
>> Nokia Research Center
>> Palo Alto
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://www.erlang.org/mailman/listinfo/erlang-questions
>>
> 




More information about the erlang-questions mailing list