p2p summary (kind of)

Thu Feb 14 10:09:38 CET 2002

Hello everyone!

After turning inside out the peer2peer concept for the last days, I reached 
some conclusions that might (some of them at least) be of interest for you. 
The ones that are just random greek ;-) just ignore them. The text is also 
available on the Wiki.

- there are many protocols on the making, but none was really an "aha!" 
experience, except maybe Chord, recommended by Joe.

- I don't have the time to build/implement a low-level protocol, even if 
that would give immediate reward by being able to connect to an already 
existing network

- Erlang offers "for free" much of the basic functionality that is needed to 
create a p2p network, through the distribution mechanism. However, Erlang's 
actual design probably doesn't scale up and if trying that it might end up 
just as Gnutella did...

- What I am mostly interested is a general framework that will permit p2p 
applications to be built upon. This means that the basic services are to be 
at least: connectivity, routing and gatewaying, security, search (of any 
kind: for other peers, for data, for services/applications). The 
applications should be just plugins that use the connection provided, using 
their own protocol.

- The protocols in use tend to begin using XML. This is just because they 
must write

<searchresult id="3">
	      <item id="4" name="file1" node="12.12.12.12:3333"/>
	      <item id="7" name="file3" node="12.132.12.13:2424"/>
</searchresult>

instead of

{searchresult, [{id, 3}],
	       [
		{item, [{id, 4},{name, "file1"},
			{node, "12.12.12.12:3333"], []},
		{item, [{id, 7},{name, "file3"},
			{node, "12.132.12.13:2424"], []}
	       ]
}

This is really a matter of taste. Converting between the two is 
straightforward.

- Let us see how Erlang works for the 4 areas outlined above (I only guess 
some of this stuff, please fill in the right situation if you know I'm 
wrong):

-- connectivity: it is automatic using the underlying distribution 
mechanism; but will it scale? I doubt it strongly. A fully connected net is 
not manageable (not with thousands of nodes), so the connections should be 
kept limited. This creates the need of relaying messages between nodes that 
aren't directly connected, because it would be very elegant if the present 
location transparency would be kept. I.E. are Pids enough for identifying 
processes on nodes that aren't connected?

-- routing and gatewaying: nothing exists now that will help in this case, 
as far as I know. This is functionality that must be built in. One of the 
most important things is how to be able to bridge through firewalls, or over 
different kind of networks.

-- security: here we have a big can of worms... as Erlang works now it is 
fully open for anyone knowing the cookie. Some studies have been made, but 
since we are only talking about exchanging messages, not code, we probably 
don't need SafeErlang yet. Probably it would be enough with a node that has 
a modified net_kernel AND it doesn't allow for more than message passing (no 
remote spawns). I'm not sure if the latter can be achieved only via 
net_kernel. There is also another problem: how to get all nodes have the 
same cookie? That might be possible to get around with a new net_kernel (if 
this control isn't buried deeper), and allow nodes with different cookies to 
connect, and possibly have different security policies for different 
cookies. This way a node can be a full node on the intranet, while being 
connected to the outside world too.

-- search: this is mostly a p2p issue, so it isn't addressed in today's 
Erlang. A protocol needs to be defined and implemented, that will also rule 
the routing and gatewaying behaviour.

- The big problem here is that there might be security issues that won't be 
noticed until it's too late. Because of that it is wiser to have a separate 
connection management, where we can more easily decide what's okay and 
what's not. This might ease up the task of bridgeing with different networks 
(instead of an IP socket we use another transport, or we go over HTTP). It 
won't be as elegant as Pid ! Msg, but I for one can live with send(Pid, Msg) 
:-)

- Of course, one can write p2p applications without any such platform 
underneath. But it's kind of a waste to address the same issues for every 
application, and the one that each one of them will necessary meet is how to 
access nodes behind firewalls.

What do you people think? Am I babbleing, or is there a trace of rational 
thinking?

best regards,
Vlad

_________________________________________________________________
Kom med i världens största e-posttjänst; MSN Hotmail. 
http://www.hotmail.com/sv