[erlang-questions] beginner: Updating Data Structures

Mon Oct 29 20:24:49 CET 2007

Hi,

Perhaps you could consider a two stage upgrade. First upgrade all  
software to a version that understands the new record as well as the  
old (dynamically dispatching/converting on record size). Then once  
that is done invoke the command to tell nodes to start using the new  
record (perhaps also doing a few table transforms along the way).

It can sometimes help to use the fact that records are also tagged  
tuples. Kind of ugly in the code, but could be isolated to a small  
number of places.

Another option we have with some success is to have a single extension  
field in the record that holds a tagged tuple list. It is  
extraordinary how much such a structure can be abused ;-)

Sean

On 29 Oct 2007, at 17:26, David Mercer wrote:

> While an Erlang system has the ability to update its program on the  
> fly, updating data structures on the fly seems a bit more  
> difficult.  Unless you can upgrade all nodes simultaneously, some  
> nodes will be expecting the old data structure while others then  
> new.  My question therefore, is how to structure my data?  Is there  
> an approach that I am missing that is both upgrade-friendly and ETS/ 
> Mnesia-compatible?  Please see the following paragraphs for my  
> analysis so far.
>
> Suppose we are writing an inventory control application.  We decide  
> to create a record to contain our information about items in our  
> inventory.  Not much to say about items, really, so we’re just going  
> to hold the item’s name in a record.  If something else ever needs  
> to be tracked regarding these items, we can always upgrade our data,  
> right?
>
> -record(item,
>        { name
>        }).
>
> So we roll out our new inventory system to 3,000 nodes in our 25  
> warehouses in 6 different countries, and everything works  
> swimmingly.  For a while.
>
> However, some time later, our accounting department decides we need  
> a way to value our inventory, and each item should have a value  
> associated with it.  That way, we can calculate inventory value  
> simply by multiplying value by quantity at each location.   
> Unfortunately, we cannot now use our record structure.  What to do?
>
> Well, naïvely, we decide to just modify our item record.
>
> -record(item,
>        { name
>        , value
>        }).
>
> This new record structure is incompatible with the old item record  
> structure, so we will also write some code that upgrades our items  
> in the system to the new structure when we upgrade the system.   
> Unfortunately, unless our entire worldwide operation is upgraded all  
> at once, any process using the old structure will crash when it  
> encounters a new-style item, and vice versa.  Simultaneous upgrading  
> all 3,000 nodes is impractical, so we’ll have to rethink our  
> original decision.
>
> We could have created the original record structure with expansion  
> slots available for future use.
>
> -record(item,
>        { name
>        , 'X1'
>        , 'X2'
>        , 'X3'
>        }).
>
> Now when Accounting wants us to add the value of the item to the  
> item record, we simply redefine one of the expansion slots.
>
> -record(item,
>        { name
>        , value
>        , 'X2'
>        , 'X3'
>        }).
>
> This will not crash any process, since the size of resulting tuple  
> is still the same.  Unfortunately, we might run out of expansion  
> slots if we don’t allocate enough of them.  The example runs out of  
> slots once Accounting also gets their cost-basis and GL-class  
> elements added, leaving us in the same boat as before.  We simply  
> delayed the inevitable.  We might get bright and allocate the new  
> slots hierarchically by department, for instance, so Accounting gets  
> only one slot for all of its information, and we define a new record  
> for the information in that slot.
>
> -record(item,
>        { name
>        , acctg
>        , 'X2'
>        , 'X3'
>        }).
> -record(acctg_item,
>        { value
>        , cost_basis
>        , gl_class
>        , 'X1'
>        , 'X2'
>        , 'X3'
>        }).
>
> However, this approach once again only delays the inevitable.  When  
> Inventory Control and Manufacturing take up the other two expansion  
> slots, there is no room for Engineering’s data.  Plus, we have  
> multiplied this problem, since it occurs for each of our subrecords,  
> which can also run out of expansion slots.
>
> Another alternative might be to have only one expansion slot, which  
> is filled in by the next version of the item record.
>
> -record(item,
>        { name
>        , v2
>        }).
> -record(item_v2,
>        { value
>        , cost_basis
>        , gl_class
>        , v3
>        }).
>
> Now when we have more elements to add, we create an item_v3 record  
> (with a v4 element to accommodate future expansion), and so on.  The  
> problems with this, however, are that programmers need to know which  
> version of the record a certain data element is, and that by the  
> time we go through a few score enhancements and we’re up to version  
> 68, it becomes quite cumbersome, and is little better than had we  
> used a linked list.
>
> In fact, a linked list may well be better.  Instead of writing  
> functions with the record syntax, we can use lists.
>
> item_value([item, _Name, Value | _])
>        ->
>                 Value
>               .
>
> To retrieve the value, we only need to know its position in the  
> list.  This approach suffers from a couple of problems: (1) You need  
> to know the position of each element in the list; (2) This list will  
> be repeated quite frequently, so when you have 300 attributes your  
> code will be brittle, repetitive, and difficult to maintain.
>
> Perhaps an alternative approach is to define each record version  
> independently, instead of additively as we tried earlier.
>
> -record(item1,
>        { name
>        }).
> -record(item2,
>        { item
>        , value
>        , cost_basis
>        , gl_class
>        }).
>
> Now in our code, we have versions of each function matching on the  
> record structure, and a function that handles the no-match case (in  
> case you’re running v2 code when you receive a v3 record).  Once  
> again, however, we run into a couple of obstacles: (1) We must  
> implement a different version of each function for each version of  
> the record (this will get tiresome around version 68); (2) new  
> versions are not backward compatible: a node running a previous  
> version of the code will not recognize future-versioned data  
> structures, even though the only fields it needs are those from its  
> own version.
>
> Let’s borrow a page from object-oriented design principles.  Why not  
> let the item provide its own methods for data access through  
> functions contained on the structure.  We define a record “class”  
> which has two slots: one for the methods, and one for the data.  By  
> doing this, items carry around their own methods and so it doesn’t  
> really matter what version of an item something is, so long as the  
> item knows how to use its own data.  First we define some  
> infrastructure.
>
> -module(class).
> -export([invoke/3]).
> -record(class,
>        { methods
>        , data
>        }).
> invoke(Method_ID, Object = #class{methods = Methods}, Args)
>        ->
>                 Method = Methods(Method_ID),
>                 Method(Object, Args)
>               .
>
> To call a method on an object, syntax is simply “invoke(Method_ID,  
> Object, Args)”, such as
>
> X = item:new ("X"),  % Create a new item "X"
> X_Name = class:invoke(get_name, X, []),  % Returns "X"
> Y = class:invoke(set_name, X, ["Y"]).  % Changes item name
>
> This is great for encapsulation!  The implementation is  
> straightforward.
>
> -module(item).
> -export([new/1]).
> -include("class.hrl").
> -record(item,
>        { name
>        }).
>
> new(Name)
>        ->
>                 #class{ methods = fun(get_name) -> fun get_name/2
>                                   ;  (set_name) -> fun set_name/2
>                                   end
>                       , data    = #item{ name = Name }
>                       }
>               .
>
> get_name(#class{data = #item{name = Name}}, _)
>        ->
>                 Name
>               .
>
> set_name(Object = #class{data = Item}, [Name])
>        ->
>                 Object#class{data = Item#item{name = Name}}
>               .
>
> Alas, there is a fly in this ointment, too.  While it would appear  
> that the method functions are being carried around along with the  
> data (in fact, the item tuple is “{class,#Fun<item.0.96410792>, 
> {item,"X"}}”), those functions are really not carried around from  
> node to node.  Instead, Erlang only carries around references to the  
> functions.  This means if this item shows up on a node where the  
> function does not exist, an error will occur when a method is invoked.
>
> The fact that you cannot safely sling functions around with your  
> data from node to node indicates that perhaps we need a very simple  
> interface with functions that will never change.  Maybe instead of  
> using records at all, we can use basic OTP library functions to  
> associate item properties with their values.  Sounds kind of like  
> what proplists were designed for.
>
> X = [{name, "X"}],  % Create a new item "X"
> X_Name = proplists:get_value(name, X),  % Returns "X"
> Y = [{name, "Y"} | proplists:delete(name, X1)].  % Changes item name
>
> A similar effect can be had with dicts, with the decision probably  
> to be made based on performance.  (Not only that, but the decision  
> can be made dynamically at run-time, since there are functions for  
> converting between the two.)
>
> X = dict:from_list([{name, "X"}]),  % Create a new item "X"
> X_Name = dict:fetch(name, X),  % Returns "X"
> Y = dict:store(name, "Y", X).  % Changes item name
>
> This approach has the advantage of being completely backward- 
> compatible with respect to my code-base.  Should a later version of  
> our inventory application add a property, it will not change the  
> operation of any previous version.  Once again, however, there are  
> problems with this approach: (1) property values cannot be used for  
> matching in function definitions; (2) these structures are not  
> easily indexed: ETS and Mnesia require record data types.  While  
> Disadvantage 1 might be easily managed by performing lookups and  
> conditionals within the function, Disadvantage 2 is probably  
> intractable.
>
> To repeat my question, gentle readers, how ought I structure my  
> data?  Is there an approach that I am missing that is both upgrade- 
> friendly and ETS/Mnesia-compatible?
>
> Thank-you.
>
> Cheers,
>
> David
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions