[erlang-questions] refactoring a very large record

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Thu Oct 20 17:19:36 CEST 2011


On Thu, Oct 20, 2011 at 16:21, Joel Reymont <joelr1@REDACTED> wrote:
>
> On Oct 20, 2011, at 3:14 PM, Jesper Louis Andersen wrote:
>
>> I always hesitate when I hear about large records of this size. If
>> they are only read, or mostly read, they tend to be fast. But they
>> don't support updates very well as it requires you to write a new
>> record object of size 80.
>
> Are you sure?
>
> Doesn't the record tuple keep pointers to each element and only updates changes the modified pointers on update?

Right, A record is a tuple. A tuple is a set of pointers to elements
(or tagged integers). When you update an element, a new tuple gets
written. This new tuple copies all pointers from the old one and
updates the pointers for newly written elements. This is necessary due
to persistence. The number of pointer-words copied is proportional to
the tuple size. You can avoid a lot of those copies by make the tuple
more tree-structural as fewer words then has to be copied upon update.
The trick is to find out what parts of the tuple belong together and
then push those out in auxiliary subtuples.

>> I rarely access the record
>> directly, but I export "views" of the data in the record which i can
>> pattern match on outside.
>
> What does this mean?
>
> I thought that by making the record opaque you lose the ability to pattern-match on it.

You do. But you can create a function inside the module of the record
and it can access the tuple. That function has a return value. That
return value can be pattern matched on. That is, the function "views"
the tuple in another light which is ripe for pattern matching.

>> Another viable option is to make the 80-record tuple into a process.
>> Then one can move some of the work to the tuple itself rather than
>> querying for it and then acting upon it locally in processes.
>
> Message passing was 10x slower than function calls last time I checked ;-).

So, if your process can do work such that you only need to query
1/10th of the time your normally would, the option is viable. If it
can do work such that it is 1/100th the queries, it could be 10 times
faster.

-- 
J.



More information about the erlang-questions mailing list