Reliability of communication

Tue Feb 4 18:22:46 CET 2003

The weakest part in distributed erlang is IMHO that is relies on TCP  
and the implementation of TCP is most (all major) operating systems.  

No problem when communicating within a single host but for inter-host 
comms you will see severe problems as soon as you start to pull       
cables.                                                               

In order to build a real high availability system you need to make    
sure you have a high availability TCP solution either via fault       
tolerant hardware switches or software solutions supporting high      
availability on simple redundant switching hardware (i.e. what I      
presented on EUC 2001). The latter is at least as good as the hardware
solutions around and is one or two orders of magnitude cheaper.       

The obviously best solution would be to use SCTP for inter-host       
communication. As soon as the lksctp guys get stable I plan to        
dig in to it.                                                         

/Per                                                                  

> A question related to the development of safety-critical systems:   
>                                                                     
> Is inter-process and inter-node communication in Erlang only as     
reliable as TCP/IP, or are any additional error-detecting strategies  
used (e.g. hamming, cyclic or polynomial codes)?                      
>                                                                     
> Dominic.                                                            
>                                                                     
=========================================================             
Per Bergqvist                                                         
Synapse Systems AB                                                    
Phone: +46 709 686 685                                                
Email: per@REDACTED