[erlang-questions] Robustness problems when using records [WAS: Re: Question/Alternative on Frames Proposal [Warning: Long]]

Sun May 20 10:07:36 CEST 2012

On 19 May 2012, at 22:29, Tom Parker wrote:

> Therefore, I continue to assert my statement below is correct as written.  Records are the issue, and it is the destruction of the name information before it reaches the VM that is the issue.

Actually, I agree with Kostis: the biggest problem is the reliance on
.hrl files and the preprocessor for sharing code between modules.

You can put a local function definition in a .hrl file as well, and it
has been done too. Changes to a record usually trigger badmatch
or function_clause exceptions if not all modules are recompiled 
and replaced in a coordinated way. This is a problem, for sure,
but if you make subtle semantic changes to a function in a .hrl
file, or to a macro (which can expand to pretty much anything),
then *anything* can happen - including a large set of extremely
subtle errors that are hard to achieve with records.

So putting a function definition in a .hrl file can cause some
really subtle problems. This obviously doesn't mean that functions
are fundamentally flawed, but yet an example that reliance on
preprocessors is a dangerous thing. 

The worst thing you can do with records - and it's indeed a bad
thing - is to rename attributes and use the same positions for 
entirely different data (no, worst would be the *same* data type
but with a different interpretation). Luckily any half-decent 
programmer will understand enough not to do this.

Richard O'Keefe has many times stated that a key goal is to
remove the reliance on the pre-processor.

If you think about it, Erlang has no facility for sharing structured
data types between modules, except for code - as Kostis mentioned
- and pattern matching on the primitive elements of the structure.

You can provide accessor functions that are exported from the 
same place where the data structure is managed, or you can 
use data structures that are as shallow as to be easy to inspect
through pattern-matching. If this is not enough, you can pass along
property lists, which are simple, yet flexible enough to convey just
about anything - however cannot be quickly inspected with 
pattern-matching. So people trade safety for speed.

…or rely on convention, which actually works very well.

Remember that records are little more than syntactic sugar.
They do not introduce a data type that would offer anything more
than the above. You could get a similar effect by introducing a 
series of home-cooked macros, expanding to operations and
patterns on a tuple structure, and then let multiple modules 
rely on them through a .hrl file.

I even have some old code lying around from the time before 
records. It had a lot of code like this:

person(name,      {person,X,_,_}) -> X;
person(age,         {person,_,X,_}) -> X;
person(address, {person,_,_,X}) -> X.

person(name, X, {person,A,B,C}) -> {person,X,B,C};
person(age,    X, {person,A,B,C}) -> {person,A,X,C};
person(addr,  X, {person,A,B,C}) -> {person,A,B,X}.

A bit fiddly to create, but easy to verify through visual inspection.
Of course, you could put that in a .hrl file and use as a 'shrared
data type'. It would suffer from roughly the same problems as 
records.

The obvious limitations with records were the subject of animated
discussion even before they were added to the language. You
wouldn't recall this unless you were a member of a pretty small 
group of people debating this on a mailing list inside Ericsson, long
before the Open Source release.

Even then it was agreed that the proper way to do it was to introduce
a new data type. As my memory serves, the problem with that was 
that JAM didn't have any more type tags to use. A redesign of the 
memory management structure in the VM would be required in order
to add new data types. In light of this, records were seen as a
cost-efficient way to solve a common problem of copy-paste 
programming.

Later on, I believe it was the HiPE team that proposed a new tagging
scheme and wrote some working code to demonstrate. By the time
it was integrated into Erlang, records had become ubiquitous, and 
could not be easily replaced.

If my recollection is faulty, the old-timers can correct me. I recall this
playing out right after I joined Ericsson in 1996.

(The new tagging scheme was introduced in R7B in 2000, and is described
in Mikael Petterson's PhD thesis: http://www.it.uu.se/research/publications/reports/2000-029/)

One can argue that it's taken a long time to correct this issue, and 
add a structured data type with all the goodness of records and 
none of their drawbacks. The reasons:

- Erlang, even taking this into account, is a darn good language for
  writing complex, robust and scalable systems.
- Convention actually works pretty well in practice.
- The OTP team wanted to get it right the second time.
- Backwards compatibility is considered extremely important in the
  Erlang world (exactly because we don't want to break the 
  complex, robust and scalable systems already written in Erlang)

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20120520/f666b3e0/attachment.htm>