[erlang-questions] Fault-Tolerant TCP/IP Servers

David Terrell <>
Thu Jul 17 00:40:21 CEST 2008


Edwin Fine wrote:
> The thing is, if you are trying to create a truly fault-tolerant system, 
> especially a 24x7 type of application, you have to have redundancy 
> everywhere, not just in the back-end servers. If you put a FreeBSD 
> system as a cheap load balancer in front of your multiple Erlang 
> systems, it's a single point of failure.
> 
> How far do you want to go? Do you want 99.9% availability, or 99.99999%? 
> There's a HUGE difference between the two. How much unplanned downtime 
> can you afford?
> 
> In a large-scale industrial-strength (e.g. telecomms) solution, each 
> server in the farm would have dual power supplies, dual hard disks for 
> the OS, multiple network cards, and an on-board processor for remote 
> hardware-level control; the network would be a SAN where the application 
> software and data sit, redundant LAN/switched fabric segments with 
> dual-pathed redundant fiber-channel switches; everything would be on 
> UPS, and the UPS would be on a backup generator; and so on.
> 
> It all depends how much you have to spend, and how much it would cost 
> financially (or in human terms) if the system went down. If you are 
> betting the company's existence on it, you need as solid a solution as 
> you can afford. If you can't afford a solid solution, you take your best 
> shot and pray. It's a tradeoff.
> 
> Is a dedicated hardware server load balancer with built-in redundancy 
> out of the question? Some sort of Cisco appliance, for example. That 
> might be the cheapest and most effective solution for the bucks.
> 
> On Wed, Jul 16, 2008 at 1:43 PM, David Terrell < 
> <mailto:>> wrote:
> 
>     If you control (or can test) the client application, see if you can
>     simply point your DNS to multiple addresses (sometimes called round
>     robin DNS).  This puts the retry all server logic in the client app, but
>     it doesn't require extra mechanics like firewall/sync.
> 
>     If it doesn't properly try all addresses in the DNS, then I second the
>     recommendation to use OpenBSD CARP+PFSync.


OpenBSD Carp + PF Sync _is_ a redundant firewall.

If you make your firewall redundant and redirect to multiple internal
servers, making each machine fault tolerant is just duplication of 
effort.  However, the writer was just trying to find a solution to the 
network layer.



More information about the erlang-questions mailing list