[erlang-questions] Erlang newbie questions

Tue Oct 18 09:47:45 CEST 2011

On 17 Oct 2011, at 23:14, Gerry Weaver wrote:

> It has been my experience that network servers most often fail due to OS or hardware limits rather than
> bugs in the code (at least for thoroughly tested production ready code). When you factor in the amount of baggage that comes with Erlang, or any other VM based language, it's difficult to justify.

Gerry,

You have been given other helpful answers from the good people of this list. I thought I'd add something from my own experience writing commercial code in Erlang.

The thing I've noticed about Erlang-based systems is that they tend to mature very well.

I attribute this to a number of things:

== Functional programming style ==

The parts of your code that rely on pure functions (no side-effects) have the nice property that once you have weeded out the bugs, the code *stays correct*, until requirements change. I've seen this in industrial erlang-based systems: they have been robust from the start, but the quality just keeps going up.

== Error handling + programming for the correct case ==

The combo of pattern-matching, purely functional style, and process supervision, further support the notion of writing functions that are correct by design. They do what they are supposed to for expected input, and simply crash for unexpected/invalid input. This style of programming actually works very well as a *default* in Erlang. For extra robustness, one usually needs to trap and handle errors especially in a few places, but that kind of hardening is fairly straightforward, and can be done as incremental improvements.

== Ability to limit the state-event space ==

It's been mentioned that Erlang shines especially for control and coordination scenarios. An important reason is that Erlang allows you to minimise the interference between different components and events. In many design approaches, it is very hard to introduce new protocols or events, as the effects cut across the code in a very bad way. For many systems, this has the effect that they don't mature well. Changes in traffic conditions, hardware and network topologies, and requirements all have a tendency to trigger a new flurry of bugs, due to things happening in a different order than before, or events combining in unexpected ways. I have compared this to the evils of GOTO programming [1].

Given the old rule of thumb that 80% of the lifecycle cost of a system lies in evolution and maintenance, this makes of a pretty formidable advantage.

In one system I worked with, our first release was 4x better than the company norm on faults/KLOC. After a few releases - with significant feature growth - we were 12x better. Code that was corrected for bugs tended to stay correct, and bug fixes normally didn't introduce any new bugs.

A colleague of mine, Mats Cronqvist, once wrote in a workshop paper [2], after having analysed some 150 trouble reports from function- and system testing:

"Most of the errors were not coding errors, but simply a working implementation of the wrong thing." (page 2)

Another way of putting it would be that there are comparatively few "accidental errors" in Erlang-based systems - that is, errors that are a consequence of the implementation technique rather than a misunderstanding of the requirements.

BR,
Ulf W

[1] http://www.infoq.com/presentations/Death-by-Accidental-Complexity
[2] "Troubleshooting a Large Erlang System", Mats Cronqvist, 2004 ACM/SIGPLAN Erlang Workshop
  http://dl.acm.org/citation.cfm?id=1022474
  http://www.erlang.se/workshop/2004/cronqvist.pdf

Ulf Wiger, CTO, Erlang Solutions, Ltd.
http://erlang-solutions.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20111018/d3252972/attachment.htm>