Nothingness

Thu Oct 25 12:18:10 CEST 2001

I was in the midst of running tests using various schemes of representing 
null and using tagged values when I saw Hakan's message which explained 
that Erlang stores these values as via term_to_binary

That pretty much made the tests moot. Still, the results are perhaps 
interesting. I used a a test table of 1000 rows, 95 elements, mostly nulls.

type    value     mnesia file size   %      binary size
------------------------------------------
atom    nil       599K               100    7
atom    ''        389K                64    4
list    []        250K                42    2
tuple   {}        319K                53    3

using a combination of [] for null and {v, Value} for a real value:
                  398K                66

how about [] for null and {Value} for a value?
                  299K                50

Using [] to represent nulls is the most efficient, space-wise.

As to the argument that [] can be mixed up with an empty string (or any 
empty list for that matter) -- that was my first prejudice as well. For 
myself, I attribute it to DBMS and Java-think. Many (most) database systems 
distinguish between the empty string and the null value, as do programming 
languages -- and of course there is a big difference in Java, C, etc 
between a null value and an empty string (or whatever).

However it could be argued that there really is no real-life distinction 
between the empty list (also called, confusingly, nil in Erlang although I 
haven't found a context in which [] can be referred to as nil) and the 
absense of information. After all, an empty list means "no things", which 
is nothing!

To put it another way, if there are no herrings in your net, do you have an 
empty net of herrings, an empty net, or nothing at all? For the purpose of 
the activity, I don't think the difference matters at all. No fish is no 
fish -- it is also no crabs, no seaweed, no old boots. (In some sense, the 
empty net is something -- it represents the capability of collecting fish 
-- or crabs or boots.)

Back to Erlang --

If we use [] to represent missing data, then if a value of [] is found for 
an element of a record, we can deduce that there are no instances of 
whatever that element would otherwise hold. To do this we have to say that 
[] is not a valid value of any Erlang data type (unless we invent one, say 
called nil, which has one value, []):

for all non-list values this is safe since [] is disjoint, type-wise, with 
them

for lists, we just say that [] does not represent an empty _list_, just 
that it represents nothingness.  E.g., there is no such thing as an empty 
string -- this is just a fancy way of saying the string does not exist.

Of course, in the context of working with lists, an algorithm is free to 
treat [] as an empty list.

Hmmm, but what about the empty tuple or empty record?

Well, in my case I'm just talking about dealing with mnesia tables and how 
to put missing data into record elements which are then inserted into the 
table, so I think this is adequate.

I'd better get to sleep before I keep babbling!

Erik.

--On Thursday, October 25, 2001 10:24 AM +0200 Ulf Wiger 
<etxuwig@REDACTED> wrote:

> On Wed, 24 Oct 2001, Erik Pearson wrote:
>
>> Hi,
>>
>> I'm now hip-deep into my first Erlang project. It has been exciting, and
>> very productive.
>>
>> I have encountered issues, of course. The application is very
>> data-centric -- analyzing tab delimited and fixed-column-width tables,
>> importing them into mnesia tables, and subsequently using these tables
>> for lots of stuff.
>
> You might want to look at the RDBMS contrib. It has, for one
> thing, an import facility for a sort of tab-delimited text files.
> It also has its own weird definition of a null value: '#.[].#'
> It's sort of an Erlang tradition to invent an atom that no sane
> user would want to input.
>
> Atoms are stored very efficiently in RAM, but on disk, you should
> expect them to expand to strings.
>
>
>> It seems to me that the one solution would be a new disjoint
>> datatype which represents the null value, and which has only
>> one value "null".  The null value could be assigned by a
>> primitive null(), values could be tested for nullness with
>> null(Value), and could be stored efficiently.
>
> I believe it is generally accepted that any proper null handling
> must be done in the underlying runtime architecture -- not in the
> application. Erlang should have a proper NULL representation, and
> probably also infinity and negative infinity.
>
> In some applications, I've stored data internally wrapped inside
> a tuple, e.g. {X}. The null value could then safely be
> represented as e.g. {} or null, as it cannot be confused with a
> legal value. The interesting twist to this is that you can also
> use the Erlang ordering rules to represent infinity and negative
> infinity:
>
> If {} ::= NULL, [] ::= infinity, '-infinity' ::= -(infinity),
> then:
>
> '-infinity' < {X} < [] for all X, and
> {} <> {X} (of course, and also {} < {X}, but strictly speaking,
> this is irrelevant, as NULL has no useful ordering.)
>
> /Uffe
> --
> Ulf Wiger                                    tfn: +46  8 719 81 95
> Senior System Architect                      mob: +46 70 519 81 95
> Strategic Product & System Management    ATM Multiservice Networks
> Data Backbone & Optical Services Division      Ericsson Telecom AB
>

Erik Pearson
@ Adaptations
email     : erik@REDACTED
voice/fax : +1 510 527 5437
text page : page.erik@REDACTED