[erlang-questions] Twoorl: an open source Twitter clone

Thu Jun 5 11:52:18 CEST 2008

On Thu, Jun 5, 2008 at 12:03 AM, David Mitchell <monch1962@REDACTED> wrote:
> 2008/6/5 Scott Lystig Fritchie <fritchie@REDACTED>:
>> Steve Davis <steven.charles.davis@REDACTED> wrote:
>
>> As for "message queueing", there may be a misunderstanding over how MQ
>> systems typically work: they have producers *and* consumers, and (more
>> importantly) consumers actually "consume".  Consuming a queue item
>> usually means also deleting it from the queue.  A single Twitter user X
>> can have thousands of consumers all trying to consume the same messages,
>> but in a typical MQ system, all but the first consumer would find X's
>> queue empty.
>>
>> For one example, see the RabbitMQ FAQ, "Q. How do I archive to RDBMS?".
>
> In case anyone's losing track, I was the one who suggested keeping
> tweets in queues essentially forever, and having users retrieve them
> from queues without deleting the message from the que.
>
> I understand how MQ works in normal environments; what I'm suggesting
> is that Twitter (and any clones) aren't "normal" once they start to
> scale up to many millions of users.
>
> The reasons I suggested storing messages in queues indefinitely are:
> - experience says that queueing systems can scale very large, and that
> it appears to be an "easier" problem to solve than scaling a database
> very large.  I'll accept it if anyone complains about "gross
> generalisation"...
> - the APIs for storing messages to queues and then retrieving them are
> designed to be very fast, and (again referencing IBM's MQ) we know
> they can scale to queues holding very large numbers of messages
>

Yes yes yes - I have for a long time thought that non-destructive
persistent queues are the perfect data structure for
many applications. I can't see why REST has GET, PUT, POST and DELETE
- It should have GET and APPEND
(only).

Appending things to a input queue *and never deleting them* seems to
me a perfect way to deal with things.
If you every want to delete things it can only be for pragmatic
reasons and should be done years later
in a garbage collection sweep. If you never delete things you can
always go back later an fix errors!

The question of how to store the queue is unrelated to the abstraction
- probably disk replicated with
a ram replica of the  last N entries in the  tail of the queue. If the
queue entries are fixed length
(or multiples of a fixed length) then life becomes a lot easier

Many things can be build using this abstraction. Add fault-tolerance
and location transparency and you
have a wonderfully useful mechanism. (ie it would be very nice to have
GUID's that identify persistent queues -
how to do this is orthogonal to the storage mechanisms involved). To
start with a queues identified by
{Ip,Port,name} would be useful :-)

Cheers

/Joe Armstrong

> Storing messages in flat files seems to have a couple of limitations to me:
> - if you're going to store 1 message per flat file, you need a
> database (or database-like thing) to track those zillions of flat
> files.  I figure that's going to put you back where you started in
> terms of scalability
> - assuming you're always appending messages to the end of flat files,
> you'd have to assume that most requests will be for the most recent
> message i.e. the last message in the file.  Do you really want to be
> seeking through to the last record of flat files all the time?  That
> doesn't seem to be a scalable approach
> - alternately, if you always add the most recent message to the
> *start* of a flat file, you'll constantly be rewriting the entire file
> (at least, that's the case in any file system I can think of; there
> might be an exception).  I suppose you could write your own file
> system to optimise that...
>
> Please speak up if you've got any thoughts - I'm treating this like a
> bunch of intellectuals throwing ideas around, rather than an argument
> about right and wrong, and it seems that everyone else is too at this
> stage.  Very happy to be convinced I'm wrong, in other words
>
> Regards
>
> David Mitchell
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>