[erlang-questions] Twoorl: an open source Twitter clone

Thu Jun 5 00:26:17 CEST 2008

David Mitchell wrote:
> 2008/6/5 Scott Lystig Fritchie <fritchie@REDACTED>:
>> Steve Davis <steven.charles.davis@REDACTED> wrote:
> 
>> As for "message queueing", there may be a misunderstanding over how MQ
>> systems typically work: they have producers *and* consumers, and (more
>> importantly) consumers actually "consume".  Consuming a queue item
>> usually means also deleting it from the queue.  A single Twitter user X
>> can have thousands of consumers all trying to consume the same messages,
>> but in a typical MQ system, all but the first consumer would find X's
>> queue empty.
>>
>> For one example, see the RabbitMQ FAQ, "Q. How do I archive to RDBMS?".
> 
> In case anyone's losing track, I was the one who suggested keeping
> tweets in queues essentially forever, and having users retrieve them
> from queues without deleting the message from the que.
> 
> I understand how MQ works in normal environments; what I'm suggesting
> is that Twitter (and any clones) aren't "normal" once they start to
> scale up to many millions of users.
> 
> The reasons I suggested storing messages in queues indefinitely are:
> - experience says that queueing systems can scale very large, and that
> it appears to be an "easier" problem to solve than scaling a database
> very large.  I'll accept it if anyone complains about "gross
> generalisation"...
> - the APIs for storing messages to queues and then retrieving them are
> designed to be very fast, and (again referencing IBM's MQ) we know
> they can scale to queues holding very large numbers of messages
> 
> Storing messages in flat files seems to have a couple of limitations to me:
> - if you're going to store 1 message per flat file, you need a
> database (or database-like thing) to track those zillions of flat
> files.  I figure that's going to put you back where you started in
> terms of scalability

better coalesce the messages for a particular user's consumption in
a single file. better for FS inodes utilization, seek times (latency),
disk and memory fragmentation.

> - assuming you're always appending messages to the end of flat files,
> you'd have to assume that most requests will be for the most recent
> message i.e. the last message in the file.  Do you really want to be
> seeking through to the last record of flat files all the time?  That
> doesn't seem to be a scalable approach

btw, seek to the end is O(1), if not O(0) (jokingly), if the file
entries are self-delimiting and (as an optimization)
double-tagging (message size at the beginning and at the end of the
message).

> - alternately, if you always add the most recent message to the
> *start* of a flat file, you'll constantly be rewriting the entire file
> (at least, that's the case in any file system I can think of; there
> might be an exception).  I suppose you could write your own file
> system to optimise that...

this is not necessary.

> Please speak up if you've got any thoughts - I'm treating this like a
> bunch of intellectuals throwing ideas around, rather than an argument
> about right and wrong, and it seems that everyone else is too at this
> stage.  Very happy to be convinced I'm wrong, in other words
> 
> Regards
> 
> David Mitchell
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions