Nothingness

Thu Oct 25 01:35:14 CEST 2001

Hi,

I'm now hip-deep into my first Erlang project. It has been exciting, and 
very productive.

I have encountered issues, of course. The application is very 
data-centric -- analyzing tab delimited and fixed-column-width tables, 
importing them into mnesia tables, and subsequently using these tables 
for lots of stuff. One of the immediate issues I ran up against was how 
to represent missing data -- that which would be representated in an SQL 
database as NULL.

The data flows like this:
Text File -> List of field values -> Tuple of field values -> Record -> 
Mnesia table

When missing data is detected in the external data, I'd like to 
explicitly represent it as some value in Erlang. I'm currently using an 
atom 'nil', since it happens to be distinct from any data that might be 
encountered. However, I'd like to use atoms to represent external data 
in some cases, and am uncomfortable with the method I've chosen for NULL 
representation. It also seems inefficient. The dataset is large and all 
those 'nil' atoms (which I could reduce to '') will add up.

It would seem that a very "Erlangish" way would be to tag all the data, like

na for missing data
{value, MyValue} for actual data

This lets one easily separate values from non-values without any worry 
about 'nil' stepping on one's toes.

However, this seems at first blush inefficient and cumbersome. Is this 
true? Perhaps atoms are stored very efficiently? (Of course, the tag 
atoms could be made shorter as well)

It seems to me that the one solution would be a new disjoint datatype 
which represents the null value, and which has only one value "null". 
The null value could be assigned by a primitive null(), values could be 
tested for nullness with null(Value), and could be stored efficiently.

Thoughts?

Thanks in advance,

Erik Pearson