Flash sockets can open links to external sites, it requires the same<br>verification as local sites now do since (v9.0.16.0?) which means when<br>you try to .connect, flash will setup a temporary connection, send a <br><policy-file-request/> and expects an xml snippet that looks something <br>

like <a href="http://api.flickr.com/crossdomain.xml">http://api.flickr.com/crossdomain.xml</a>.<br><br>An IP lookup does sound nicer than a full http proxy, As far as I know you<br>cant do cross domain POSTs with js though, GET works fine with json <br>

return.<br><br><div class="gmail_quote">2008/6/1 Joe Armstrong <<a href="mailto:erlang@gmail.com">erlang@gmail.com</a>>:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

It *is* an interesting discussion - not about Twitter - but about architectures.<br>

<br>

There seem to be some implicit assumptions here:<br>

<br>

Let's suppose that twoorl services are accessed through a *single*<br>

name (<a href="http://www.twooerl.org" target="_blank">www.twooerl.org</a>) or something.<br>

<br>

Let's assume we have 20 (for the sake of argument) back-end machines<br>

(could be hundreds or thousands  though)( I guess scalable means that<br>

we can just add/remove backend machines without breaking<br>

things :-)<br>

<br>

The first step must be to associate 20 IP addresses with this single<br>

name - some kind of DNS load balancing should be possible. (how do you<br>

do this?????)(is this DNS round robin????)(does this scale to<br>

thousands of machines???)<br>

<br>

The user tries to connect to <a href="http://www.twooerl.org" target="_blank">www.twooerl.org</a> and looks up the address<br>

in DNS the result is one of these IP addresses.<br>

<br>

The user connects to one of theses IP addresses and and requests data<br>

for "joe" - they are immediately redirected to the machine having<br>

joe's data.<br>

<br>

The simplest way to find a machine would be to use the idea in chord.<br>

<br>

For a machine with IP <a href="http://123.45.12.34" target="_blank">123.45.12.34</a> we create a tuple<br>

<br>

{"<a href="http://123.45.12.34" target="_blank">123.45.12.34</a>", Md5("<a href="http://123.45.12.34" target="_blank">123.45.12.34</a>")} we do this for all machines and<br>

sort my the Md5 sum<br>

<br>

<br>

so we get a list of twenty machines, say:<br>

<br>

   [{"<a href="http://122.34.45.67" target="_blank">122.34.45.67</a>", Md1}, {"<a href="http://223.56.1.23" target="_blank">223.56.1.23</a>", Md2}, ... ]<br>

<br>

Which machine is "joe" stored on? - to find this we compute md5("joe")<br>

and find the<br>

first machine in the list whose Md5 sum is greater than md5("joe").<br>

<br>

The initial machine would perform this computation and redirect to the<br>

machine with joe's data.<br>

<br>

The initial machine can also check the liveness this second machine -<br>

and if it is unresponsive<br>

redirect to a machine containing a replica of joe's data (which could<br>

this be? - imagine the<br>

machines arranged in a circle and redirect to the machine nearest to<br>

180 degrees away from the<br>

original machine)<br>

<br>

A problem occurs if the original machine is dead - (ie the one that<br>

DNS said was the address associated<br>

with www.twoo.erl) - if the back-end machines are in pairs that<br>

monitor each other then I guess<br>

a gratuitous arp can fix the problem and reassign the the IP address<br>

of the failing machine to a<br>

new machine (which must now take over the role of the first machine)<br>

<br>

In all of this the fundamental problem seems to be that if the server<br>

fails then we have to do a lot<br>

of messing around to keep the server on a fixed IP address.<br>

<br>

It would be a zillion times easier if the *client* did all this<br>

messing around. If the client<br>

knew about N server address it could automatically redirect to a<br>

different server if the primary server<br>

failed. if the client knew about N machines and performed the md5(Key)<br>

calculation to find the correct machine then the problem would be very<br>

easy to solve with no messing around in the server.<br>

<br>

(( The fact that DNS has multiple addresses make writing a resolver<br>

really easy, if one DNS server<br>

fails just try anothert ))<br>

<br>

Now if we're talking "standard browsers" then accessing multiple sites<br>

is painful. Javascript and flash<br>

plugins etc. are very restrictive in the sites that they can open<br>

sockets to (for good reasons)<br>

I think only the originating site is permitted.<br>

<br>

If we could persuade the users to install a small proxy one their<br>

machines then all these problems would<br>

go away - a standard browser could talk to a proxy on localhost and<br>

this could talk to the multiple<br>

back ends.<br>

<br>

What appears to be a tricky problem in scaling things if we have to<br>

keep the back-end servers on fixed<br>

addresses seems a lot easier if the clients have some limited intelligence.<br>

<br>

The next step is (ovf course) to allow all the clients to collectively<br>

behave as if they were as server -<br>

I think therefore that the problem is really one about the base level<br>

of a P2P architecture and not<br>

about a twitter architecture per se.<br>

<br>

If we did have a simple proxy component that allowed messaging to<br>

multiple sites then this and many other<br>

problems would be easily soluble.<br>

<br>

We might imaging a proxy that that was programmable:<br>

<br>

It presents a menu like this:<br>

<br>

    {allow, ServiceName, at, Addr1, Addr2, Addr3, .....}<br>

<br>

(( example {allow, twitter, at, www.a.b, www.p.q, www.c.d}<br>

    - this means that the proxy can open "twitter" sessions to the<br>

three "trusted" machines in the list<br>

<br>

    then the web browser could access the proxy that could talk to the<br>

trusted machines -<br>

the trusted machine should just redirect to other machines, until<br>

finally the desired machines are found. ))<br>

<br>

  Would the fact that you have to install a proxy server limit<br>

deployment of a service? - probably.<br>

Also it opens up a new bag of worms (security) - for good reasons<br>

browsers do not allow plugins<br>

to access multiple sites (only the originating sites).<br>

<br>

I suppose therefore, that the inner cluster that provides the service<br>

would have a full P2P structure<br>

and that the service would be accessed by DNS round robin with IP<br>

failover to handle the errors.<br>

<br>

I suspect that architecture like this are being used in some Erlang<br>

systems (the details might vary)<br>

<br>

If anybody would like to post code and go into the details of how to<br>

rig systems for DNS load balancing<br>

(or whatever it's called) and for IP monitoring and fail-over then we<br>

could get to the interesting<br>

part of building the application)<br>

<br>

(( the next bit will be to look at the limits of scaling - still<br>

nobody has talked numbers -<br>

   how far can we press a two-tier system - with say 20 name servers<br>

in the front-end that *only* do<br>

   redirects - this bit should be very fast ))<br>

<br>

(( By combining twitter with IRC we might make the option of<br>

installing a proxy more attractive -<br>

   The irc "server" for a group G should really me the client that<br>

first created the group G -<br>

    If G drops out the second machine in the group could become the<br>

server and so on.<br>

    Really twitter is like having one IRC grrup per person. many<br>

people can join but only the<br>

    owner can write to it. What's the difference (architectually) .<br>

))<br>

<br>

Cheers<br>

<font color="#888888"><br>

/Joe Armstrong<br>

</font><div><div></div><div class="Wj3C7c"><br>

<br>

<br>

On Sun, Jun 1, 2008 at 10:41 AM, David Mitchell <<a href="mailto:monch1962@gmail.com">monch1962@gmail.com</a>> wrote:<br>

> This is a REALLY interesting discussion, but at this point it's<br>

> becoming obvious that I don't know enough about Twitter...<br>

><br>

> Are you suggesting that Twoorl should be architected as follows:<br>

> - when they register, every user gets assigned their own RabbitMQ<br>

> incoming and outgoing queues<br>

> - user adds a message via Web/Yaws interface (I know, this could be<br>

> SMS or something else later...)<br>

> - message goes to that user's RabbitMQ incoming queue<br>

> - a backend reads messages from the user's incoming queue, looks up in<br>

> e.g. a Mnesia table to see who should be receiving messages from that<br>

> user and whether they're connected or not.  If "yes" to both, RabbitMQ<br>

> then forwards the message to each of those users' outgoing queues<br>

> - either the receiving users poll their outgoing queue for the<br>

> forwarded message, or a COMET-type Yaws app springs to life and<br>

> forwards the message to their browser (again, ignoring SMS)<br>

><br>

> This seems like a reasonable approach; I'm just curious if that's what<br>

> you're suggesting, or whether you've got something else in mind.<br>

><br>

> Great thread, and thanks Yariv for getting this discussion going with Twoorl<br>

><br>

> Regards<br>

><br>

> Dave M.<br>

><br>

> 2008/6/1 Steve <<a href="mailto:steven.charles.davis@gmail.com">steven.charles.davis@gmail.com</a>>:<br>

>><br>

>> On May 31, 5:04 pm, "Yariv Sadan" <<a href="mailto:yarivsa...@gmail.com">yarivsa...@gmail.com</a>> wrote:<br>

>>> ...but it's the only way you can scale this kind of service when N is<br>

>>> big.<br>

>><br>

>> Hmmm, Yariv, aren't you still thinking about this in the way that Dave<br>

>> Smith pointed to as the heart of the issue? i.e.<br>

>> Dave said: "My understanding is that the reason they have such poor<br>

>> uptime is due to the fact that they modeled the problem as a web-app<br>

>> instead of a messaging system."<br>

>><br>

>> I'm aware that you are likely a good way away from hitting any<br>

>> scalability problems, but some kind of tiering would seem to be<br>

>> appropriate if twoorl is to be "twitter done right". Yaws at the front<br>

>> end, definitely - but rather /RabbitMQ/ at the back end. I believe<br>

>> that you'd then have the flexibility to distribute/cluster as<br>

>> necessary to scale to the required level (whatever that may be).<br>

>><br>

>> For sure, Twoorl is a great demo of what can be done with Erlang in an<br>

>> incredibly short time. I'm a relative noob to Erlang, and have learned<br>

>> a great deal from your blog/code/examples.<br>

>><br>

>> Steve<br>

>> _______________________________________________<br>

>> erlang-questions mailing list<br>

>> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

>> <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>

>><br>

> _______________________________________________<br>

> erlang-questions mailing list<br>

> <a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

> <a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>

><br>

_______________________________________________<br>

erlang-questions mailing list<br>

<a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br>

<a href="http://www.erlang.org/mailman/listinfo/erlang-questions" target="_blank">http://www.erlang.org/mailman/listinfo/erlang-questions</a><br>

</div></div></blockquote></div><br>