[erlang-questions] how to break the problem. the erlang way?

Fri Jul 31 22:02:58 CEST 2009

On Fri, Jul 31, 2009 at 7:48 PM, Hynek Vychodil<vychodil.hynek@REDACTED> wrote:
>
>
> On Fri, Jul 31, 2009 at 4:47 PM, Ovidiu Deac <ovidiudeac@REDACTED> wrote:
>>
>> I'm doing some evaluation of erlang so I came with the following problem:
>>
>> "The application has to subscribe to a multicast address and receive
>> binary packets. One packet per datagram. Each packet has a lenght, a
>> sequence number and a number of messages inside. Packets have to be
>> processed in their sequence number order. The messages have to be
>> extracted from the packets and written in a file."
>>
>> I come with C++/Python experience and in OO approach I would have the
>> following components:
>> 1. A Receiver who connects to the multicast and receives the packets
>> 2. An Orderer who's responsability is to order the packets by their
>> sequence number and detect the missing ones.
>> 3. An Unpacker who's responsability is to unpack the incomming packets
>> and extract the messages.
>> 4. A Decoder who does the deserialization of the messages
>> 5. A Writer who puts the messages in the file.
>>
>> Now if I move all this to Erlang I would map the objects to processes.
>> Instead of having objects with methods being called I have processes
>> which receive messages. So the 5 components would run as separate
>> processes. Each one does a little job and passes the result to the
>> next one.
>>
>> Is this the Erlang way? Or is it just too much message passing overhead?
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>>
> First think what I usually do is determine what id data transformation and
> what is process. They are different thinks. One process can do several data
> transformations and also one data transformation can be done by several
> processes in parallel. From your points only 1. na 5. seems as pure
> processes for me. Point 2. can be achieved as process which can make thinks
> easier but 3. and 4. are pure data transformations. In Erlang for data
> transformations just write ordinal modules with pure functions. For
> processes you also have in OTP tools how write most of their functionality
> in call back modules. My initial design would be this:
>
> One process as receiver (1.) which receives packets and sends to another
> processes which I would spawn for CPU intensive tasks for unpacking (3.) a
> decoding (4.) of messages for workload of one package. Results will be send
> to orderer/cache (2.) which serialise messages in right order for writting
> to disk. File writer is process as usual in Erlang. Formating for file
> format I would do in same process as unpacking and decoding. I can't see
> reason why I should do it as last thing ;-)
>
> Receiver (1.) -> many proceses each do (3.), (4.), (5. formating) for one
> incomming packet -> Order for write (2.) -> Write (5.) (But it can be just
> "file" process or RAW file and then there will not be any other process.)
>
> (1.), (2.) gen_server
> (3.), (4.), (5. formating) just plain modules with pure functions called
> inside normal Erlang process.
> There can be dispatcher/supervisor process for those but it is not necessary
> and it will complicate things.
>
> Above is design form maximal thourgput. If you have some expectation of flow
> control, reliability and other it will make it more complicate it too.
>
> --
> --Hynek (Pichi) Vychodil
>
> Analyze your data in minutes. Share your insights instantly. Thrill your
> boss.  Be a data hero!
> Try Good Data now for free: www.gooddata.com
>

Thanks to everybody for the answers.

I like this suggestion very much. Now that I think of it I can see
it's basically a map-reduce approach i.e. brake the work in small
units, spawn processes for each unit which do the processing and then
collect the results.

...and if we consider scalability, this approach doesn't care about a
fixed number of processing units so it should scale very well to any
number of cores/processors available.

Thanks again,
ovidiu