Nothingness
Erik Pearson
erik@REDACTED
Thu Oct 25 12:18:10 CEST 2001
I was in the midst of running tests using various schemes of representing
null and using tagged values when I saw Hakan's message which explained
that Erlang stores these values as via term_to_binary
That pretty much made the tests moot. Still, the results are perhaps
interesting. I used a a test table of 1000 rows, 95 elements, mostly nulls.
type value mnesia file size % binary size
------------------------------------------
atom nil 599K 100 7
atom '' 389K 64 4
list [] 250K 42 2
tuple {} 319K 53 3
using a combination of [] for null and {v, Value} for a real value:
398K 66
how about [] for null and {Value} for a value?
299K 50
Using [] to represent nulls is the most efficient, space-wise.
As to the argument that [] can be mixed up with an empty string (or any
empty list for that matter) -- that was my first prejudice as well. For
myself, I attribute it to DBMS and Java-think. Many (most) database systems
distinguish between the empty string and the null value, as do programming
languages -- and of course there is a big difference in Java, C, etc
between a null value and an empty string (or whatever).
However it could be argued that there really is no real-life distinction
between the empty list (also called, confusingly, nil in Erlang although I
haven't found a context in which [] can be referred to as nil) and the
absense of information. After all, an empty list means "no things", which
is nothing!
To put it another way, if there are no herrings in your net, do you have an
empty net of herrings, an empty net, or nothing at all? For the purpose of
the activity, I don't think the difference matters at all. No fish is no
fish -- it is also no crabs, no seaweed, no old boots. (In some sense, the
empty net is something -- it represents the capability of collecting fish
-- or crabs or boots.)
Back to Erlang --
If we use [] to represent missing data, then if a value of [] is found for
an element of a record, we can deduce that there are no instances of
whatever that element would otherwise hold. To do this we have to say that
[] is not a valid value of any Erlang data type (unless we invent one, say
called nil, which has one value, []):
for all non-list values this is safe since [] is disjoint, type-wise, with
them
for lists, we just say that [] does not represent an empty _list_, just
that it represents nothingness. E.g., there is no such thing as an empty
string -- this is just a fancy way of saying the string does not exist.
Of course, in the context of working with lists, an algorithm is free to
treat [] as an empty list.
Hmmm, but what about the empty tuple or empty record?
Well, in my case I'm just talking about dealing with mnesia tables and how
to put missing data into record elements which are then inserted into the
table, so I think this is adequate.
I'd better get to sleep before I keep babbling!
Erik.
--On Thursday, October 25, 2001 10:24 AM +0200 Ulf Wiger
<etxuwig@REDACTED> wrote:
> On Wed, 24 Oct 2001, Erik Pearson wrote:
>
>> Hi,
>>
>> I'm now hip-deep into my first Erlang project. It has been exciting, and
>> very productive.
>>
>> I have encountered issues, of course. The application is very
>> data-centric -- analyzing tab delimited and fixed-column-width tables,
>> importing them into mnesia tables, and subsequently using these tables
>> for lots of stuff.
>
> You might want to look at the RDBMS contrib. It has, for one
> thing, an import facility for a sort of tab-delimited text files.
> It also has its own weird definition of a null value: '#.[].#'
> It's sort of an Erlang tradition to invent an atom that no sane
> user would want to input.
>
> Atoms are stored very efficiently in RAM, but on disk, you should
> expect them to expand to strings.
>
>
>> It seems to me that the one solution would be a new disjoint
>> datatype which represents the null value, and which has only
>> one value "null". The null value could be assigned by a
>> primitive null(), values could be tested for nullness with
>> null(Value), and could be stored efficiently.
>
> I believe it is generally accepted that any proper null handling
> must be done in the underlying runtime architecture -- not in the
> application. Erlang should have a proper NULL representation, and
> probably also infinity and negative infinity.
>
> In some applications, I've stored data internally wrapped inside
> a tuple, e.g. {X}. The null value could then safely be
> represented as e.g. {} or null, as it cannot be confused with a
> legal value. The interesting twist to this is that you can also
> use the Erlang ordering rules to represent infinity and negative
> infinity:
>
> If {} ::= NULL, [] ::= infinity, '-infinity' ::= -(infinity),
> then:
>
> '-infinity' < {X} < [] for all X, and
> {} <> {X} (of course, and also {} < {X}, but strictly speaking,
> this is irrelevant, as NULL has no useful ordering.)
>
> /Uffe
> --
> Ulf Wiger tfn: +46 8 719 81 95
> Senior System Architect mob: +46 70 519 81 95
> Strategic Product & System Management ATM Multiservice Networks
> Data Backbone & Optical Services Division Ericsson Telecom AB
>
Erik Pearson
@ Adaptations
email : erik@REDACTED
voice/fax : +1 510 527 5437
text page : page.erik@REDACTED
More information about the erlang-questions
mailing list