[erlang-questions] Twoorl: an open source Twitter clone

Sun Jun 1 00:04:56 CEST 2008

On Fri, May 30, 2008 at 7:18 AM, Joe Armstrong <erlang@REDACTED> wrote:
> Hi Yariv
>
> Well done - now a few questions:
>
>    - what is Twitter? and why is it good (if it is good) - can you
> explain in a couple of paragraphs what it does? It seemed to be some
> kind of new way for interrupting people ...
>
>      (this is from one who thinks that e-mail is intrusive and whose
> mobile phone is usually
>       turned off :-)

Twitter is a micro-blogging service. It allows asynchronous
broadcasting of casual messages/questions/observations/brain-farts to
your friends. It's not intrusive like IM because it's asynchronous.
It's more like a public mailing list where you only see the messages
from people you choose to follow.

At first I didn't get Twitter, but once I started using it I realized
it's kinda fun.

>
>  - the reaction to your work seems to be (oh but it doesn't  scale -
> so Erlang must be no good -
>    despite the fact that I believe you expressly said it was a quick
> hack and wasn't designed to
>   scale)

What can ya do? :)

>
>  - I might be nice to say "it doesn't scale YET" - and then sit back
> and let us help
>    you make it VERY scalable. I like a good challenge.
>
>    Could you try to describe exactly what Twitter is (in an as
> abstract way as possible) so that old
> fogies like me can wrap our brains around the problem of making it scale -
>
>   So what does it do? How many things must it scale to (give me
> numbers here - what are
> talking about - scaling is not some abstract quantity - it's number of
> bytes of data delivered to end-users
> per gram of CO2 - how many bytes/gm of CO2 are we aiming at?)
>
> (aside) What does scalable mean? - I suppose the answer is some
> constant cost per user.
>
> A more literal answer would be "doesn't break when we add more users"
> - but surely the cost
> verses number of users curve must be more important.
>
> In the context of Twitter what does the word "scalable mean" (how
> about, we can handle
> 100,000 users per "box" - each box costs 500$ and consumes 25Watts -
> this is constant up to
> 6x10^9 users)
>
> So if anybody who knows about this stuff can chip in with a few
> figures it would be helpful
> - what are the desired costs for running twitter for a few million
> people - how would you like
> this to scale
>
> (/aside)
>
>
> /Joe Armstrong

Here's how I userstand the main scaling challenge of a Twitter-like service:

You have M users and each user follows an average of N other users. To
render a user's timeline, you need to present the latest 20 tweets
(twoorls in the new terminology :) ) from all his friends. The naive
way to do it is to fetch N*20 messages, sort them, then take the first
20. This is expensive to do when N is big. Another problem is that
because of the frequent updates from typical users, caching this
calculation doesn't give you much because the cache gets frequently
invalidated between page loads.

The only solution I see is to keep a "latest twoorls" store for each
user. When a friend sends a new twoorl, it (or a pointer to it) is
copied to the twoorl store for all his friends. This trades space for
speed, and it limits the paging you can do to see your twoorl history,
but it's the only way you can scale this kind of service when N is
big.

Yariv