[erlang-questions] learner's questions -- tuples, CEAN & jungerl

Tue Apr 26 00:36:43 CEST 2011

On Mon, Apr 25, 2011 at 15:27, Icarus Alive <icarus.alive@REDACTED> wrote:
>
> In this particular context, do we mean that...
>  { person, { name, joe }, { age, 42 } }               -- is boxed
> representation, and
>  { person, joe, 42 }                                        -- is unboxed
> repsentation ?
> or
>   | $j,$o,$e | 42.0000 |  (as 32 bytes in memory) -- is unboxed (internal)
> representation ?
> I.e. what exactly are we calling the boxed / unboxed representation in this
> context.

When I write "boxed" I mean that we have a pointer to a value cell in
the heap somewhere. Unboxed values are then those which we have direct
access to. For instance, take {3, 7} as an example There are two
obvious representations. One is that we have a tuple {P1, P2} where P1
is a pointer to a cell containing 3 and P2 is a pointer to a cell
containing 7. But in this case where the values are small, we could
store them directly in the tuple where P1 and P2 are. There are some
important decisions to make, like we probably need to use tagged
arithmetic on them. But this representation would not have 3 and 7 in
a boxes.

In short: A box is when we have a pointer to a cell containing X
rather than having X directly.

Personally, I'd not worry too much about memory usage at the beginning
and then later look at it if it became a troubling point. There are so
many ways to limit the memory usage for such data that you can often
work around it later. What I would do though is to make sure my
representation is living in a separate module so it is easy to
abstract out later if it became a problem.

As for the ETS table storing everything: It depends. If you "get"
something from an ETS table it is copied into the heap of the
requesting process. So you can't obviously use it for large data
unless you intend to operate on them a lot. On the other hand, recent
Erlangs optimize concurrent reads to such tables if you set the right
option on the table, so you can get some decent speeds from it.

In other situations, where no sharing of the data are needed, I
usually just keep those in the process that will need the data.

If everything fails, you can always write a NIF-interface to a
C-representation of the data - where you utilize domain-knowledge to
minimize memory usage. But in my opinion that is a card you want to
play when it becomes a problem in practice, not before.

-- 
J.