[erlang-questions] Twoorl: an open source Twitter clone

Sun Jun 1 17:03:02 CEST 2008

Flash sockets can open links to external sites, it requires the same
verification as local sites now do since (v9.0.16.0?) which means when
you try to .connect, flash will setup a temporary connection, send a
<policy-file-request/> and expects an xml snippet that looks something
like http://api.flickr.com/crossdomain.xml.

An IP lookup does sound nicer than a full http proxy, As far as I know you
cant do cross domain POSTs with js though, GET works fine with json
return.

2008/6/1 Joe Armstrong <erlang@REDACTED>:

> It *is* an interesting discussion - not about Twitter - but about
> architectures.
>
> There seem to be some implicit assumptions here:
>
> Let's suppose that twoorl services are accessed through a *single*
> name (www.twooerl.org) or something.
>
> Let's assume we have 20 (for the sake of argument) back-end machines
> (could be hundreds or thousands  though)( I guess scalable means that
> we can just add/remove backend machines without breaking
> things :-)
>
> The first step must be to associate 20 IP addresses with this single
> name - some kind of DNS load balancing should be possible. (how do you
> do this?????)(is this DNS round robin????)(does this scale to
> thousands of machines???)
>
> The user tries to connect to www.twooerl.org and looks up the address
> in DNS the result is one of these IP addresses.
>
> The user connects to one of theses IP addresses and and requests data
> for "joe" - they are immediately redirected to the machine having
> joe's data.
>
> The simplest way to find a machine would be to use the idea in chord.
>
> For a machine with IP 123.45.12.34 we create a tuple
>
> {"123.45.12.34", Md5("123.45.12.34")} we do this for all machines and
> sort my the Md5 sum
>
>
> so we get a list of twenty machines, say:
>
>   [{"122.34.45.67", Md1}, {"223.56.1.23", Md2}, ... ]
>
> Which machine is "joe" stored on? - to find this we compute md5("joe")
> and find the
> first machine in the list whose Md5 sum is greater than md5("joe").
>
> The initial machine would perform this computation and redirect to the
> machine with joe's data.
>
> The initial machine can also check the liveness this second machine -
> and if it is unresponsive
> redirect to a machine containing a replica of joe's data (which could
> this be? - imagine the
> machines arranged in a circle and redirect to the machine nearest to
> 180 degrees away from the
> original machine)
>
> A problem occurs if the original machine is dead - (ie the one that
> DNS said was the address associated
> with www.twoo.erl) - if the back-end machines are in pairs that
> monitor each other then I guess
> a gratuitous arp can fix the problem and reassign the the IP address
> of the failing machine to a
> new machine (which must now take over the role of the first machine)
>
> In all of this the fundamental problem seems to be that if the server
> fails then we have to do a lot
> of messing around to keep the server on a fixed IP address.
>
> It would be a zillion times easier if the *client* did all this
> messing around. If the client
> knew about N server address it could automatically redirect to a
> different server if the primary server
> failed. if the client knew about N machines and performed the md5(Key)
> calculation to find the correct machine then the problem would be very
> easy to solve with no messing around in the server.
>
> (( The fact that DNS has multiple addresses make writing a resolver
> really easy, if one DNS server
> fails just try anothert ))
>
> Now if we're talking "standard browsers" then accessing multiple sites
> is painful. Javascript and flash
> plugins etc. are very restrictive in the sites that they can open
> sockets to (for good reasons)
> I think only the originating site is permitted.
>
> If we could persuade the users to install a small proxy one their
> machines then all these problems would
> go away - a standard browser could talk to a proxy on localhost and
> this could talk to the multiple
> back ends.
>
> What appears to be a tricky problem in scaling things if we have to
> keep the back-end servers on fixed
> addresses seems a lot easier if the clients have some limited intelligence.
>
> The next step is (ovf course) to allow all the clients to collectively
> behave as if they were as server -
> I think therefore that the problem is really one about the base level
> of a P2P architecture and not
> about a twitter architecture per se.
>
> If we did have a simple proxy component that allowed messaging to
> multiple sites then this and many other
> problems would be easily soluble.
>
> We might imaging a proxy that that was programmable:
>
> It presents a menu like this:
>
>    {allow, ServiceName, at, Addr1, Addr2, Addr3, .....}
>
> (( example {allow, twitter, at, www.a.b, www.p.q, www.c.d}
>    - this means that the proxy can open "twitter" sessions to the
> three "trusted" machines in the list
>
>    then the web browser could access the proxy that could talk to the
> trusted machines -
> the trusted machine should just redirect to other machines, until
> finally the desired machines are found. ))
>
>  Would the fact that you have to install a proxy server limit
> deployment of a service? - probably.
> Also it opens up a new bag of worms (security) - for good reasons
> browsers do not allow plugins
> to access multiple sites (only the originating sites).
>
> I suppose therefore, that the inner cluster that provides the service
> would have a full P2P structure
> and that the service would be accessed by DNS round robin with IP
> failover to handle the errors.
>
> I suspect that architecture like this are being used in some Erlang
> systems (the details might vary)
>
> If anybody would like to post code and go into the details of how to
> rig systems for DNS load balancing
> (or whatever it's called) and for IP monitoring and fail-over then we
> could get to the interesting
> part of building the application)
>
> (( the next bit will be to look at the limits of scaling - still
> nobody has talked numbers -
>   how far can we press a two-tier system - with say 20 name servers
> in the front-end that *only* do
>   redirects - this bit should be very fast ))
>
> (( By combining twitter with IRC we might make the option of
> installing a proxy more attractive -
>   The irc "server" for a group G should really me the client that
> first created the group G -
>    If G drops out the second machine in the group could become the
> server and so on.
>    Really twitter is like having one IRC grrup per person. many
> people can join but only the
>    owner can write to it. What's the difference (architectually) .
> ))
>
> Cheers
>
> /Joe Armstrong
>
>
>
> On Sun, Jun 1, 2008 at 10:41 AM, David Mitchell <monch1962@REDACTED>
> wrote:
> > This is a REALLY interesting discussion, but at this point it's
> > becoming obvious that I don't know enough about Twitter...
> >
> > Are you suggesting that Twoorl should be architected as follows:
> > - when they register, every user gets assigned their own RabbitMQ
> > incoming and outgoing queues
> > - user adds a message via Web/Yaws interface (I know, this could be
> > SMS or something else later...)
> > - message goes to that user's RabbitMQ incoming queue
> > - a backend reads messages from the user's incoming queue, looks up in
> > e.g. a Mnesia table to see who should be receiving messages from that
> > user and whether they're connected or not.  If "yes" to both, RabbitMQ
> > then forwards the message to each of those users' outgoing queues
> > - either the receiving users poll their outgoing queue for the
> > forwarded message, or a COMET-type Yaws app springs to life and
> > forwards the message to their browser (again, ignoring SMS)
> >
> > This seems like a reasonable approach; I'm just curious if that's what
> > you're suggesting, or whether you've got something else in mind.
> >
> > Great thread, and thanks Yariv for getting this discussion going with
> Twoorl
> >
> > Regards
> >
> > Dave M.
> >
> > 2008/6/1 Steve <steven.charles.davis@REDACTED>:
> >>
> >> On May 31, 5:04 pm, "Yariv Sadan" <yarivsa...@REDACTED> wrote:
> >>> ...but it's the only way you can scale this kind of service when N is
> >>> big.
> >>
> >> Hmmm, Yariv, aren't you still thinking about this in the way that Dave
> >> Smith pointed to as the heart of the issue? i.e.
> >> Dave said: "My understanding is that the reason they have such poor
> >> uptime is due to the fact that they modeled the problem as a web-app
> >> instead of a messaging system."
> >>
> >> I'm aware that you are likely a good way away from hitting any
> >> scalability problems, but some kind of tiering would seem to be
> >> appropriate if twoorl is to be "twitter done right". Yaws at the front
> >> end, definitely - but rather /RabbitMQ/ at the back end. I believe
> >> that you'd then have the flexibility to distribute/cluster as
> >> necessary to scale to the required level (whatever that may be).
> >>
> >> For sure, Twoorl is a great demo of what can be done with Erlang in an
> >> incredibly short time. I'm a relative noob to Erlang, and have learned
> >> a great deal from your blog/code/examples.
> >>
> >> Steve
> >> _______________________________________________
> >> erlang-questions mailing list
> >> erlang-questions@REDACTED
> >> http://www.erlang.org/mailman/listinfo/erlang-questions
> >>
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
> >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20080601/a4209716/attachment.htm>