[erlang-questions] beginner: Updating Data Structures

YC yinso.chen@REDACTED
Tue Oct 30 20:38:22 CET 2007


Perhaps a different way to think about the two-stage upgrade is to decouple
data definition from the database and the code?

Instead of using the record definition in both the persistence as well as
the code, you pass the data through a transformation function.  That way in
the database you have

-record(item2, {name, value, ...}).

But in the code, you have

-record(item1, {name, ...}).

In between, you have

load(DB) ->
  Tuple = read_from_database_structure(),
  convert_from_item2_to_item1(Tuple).

Thus - you can first upgrade your db (care is needed to deal with locking,
etc) in a script, and once that's done, you can upgrade code, which is in
general a faster operation.  Minimize dependencies IMO is effective to
isolate and deal with change.

Cheers,

On 10/30/07, David Mercer <dmercer@REDACTED> wrote:
>
> My understanding of how a two-stage upgrade would work follows.
>
> Prior to the upgrade, we have the following record:
>
> > -record(item1,
> >        { name
> >        }).
>
> In Stage 1 of the upgrade, support for the following record structure is
> added, but the data is not itself updated:
>
> > -record(item2,
> >        { item
> >        , value
> >        , cost_basis
> >        , gl_class
> >        }).
>
> Our Stage 1 upgrade code must be written to support both item1's and
> item2's, but it does not upgrade the data structures yet.  Only once all
> 3,000 nodes of our system have been upgraded to Stage 1 can we initiate
> Stage 2: upgrade all our 'item1' data structures to 'item2'.  Optionally,
> we
> can also purge all references to the 'item1' structure.
>
> This approach seems problematic for the following reasons: (1) the time
> and
> administrative overhead required to release a new version is doubled; (2)
> you may run into the situation in which the stages take so long to
> complete
> that we have multiple upgrades happening across the system at once; (3)
> code
> has to be rewritten every release if it handles items (to accept both
> 'item1' and 'item2' structures), even if it is not directly affected by
> the
> change.
>
> For example, take the following scenario:
>
> A new release is ordered which requires a change to the item record.  All
> the code that deals with items (which is almost all of it) is duplicated
> and
> changed to allow it to work with both the old item structure and the new.
> When this is completed, a Stage 1 upgrade is ordered worldwide.  Frankfurt
> and Singapore complete their Stage 1 upgrade quickly, but unfortunately
> our
> Los Angeles operation is caught up in a legal requirement requiring them
> to
> notify a particular client two weeks in advance of any system update.
> Meanwhile, our data center in Maputo, Mozambique has been having
> unspecified
> "problems" upgrading that is putting the whole worldwide release of these
> new features on hold.  Even after the two-week North American hold is
> lifted, Maputo still has not upgraded to Stage 1.
>
> Meanwhile, while waiting to hear back from Maputo (which continues to
> demur), Software has released a new version with yet a different version
> of
> the item record.  Now Frankfurt and Singapore have code running that works
> with all three formats, Los Angeles has two, and Maputo is still on its
> first, and *still* no new functionality has been released.
>
> Is this the best we can do?
>
> Cheers,
>
> David
>
> -----Original Message-----
> From: Sean Hinde [mailto:sean.hinde@REDACTED]
> Sent: Monday, October 29, 2007 14:25
> To: dmercer@REDACTED
> Cc: erlang-questions@REDACTED
> Subject: Re: [erlang-questions] beginner: Updating Data Structures
>
> Hi,
>
> Perhaps you could consider a two stage upgrade. First upgrade all
> software to a version that understands the new record as well as the
> old (dynamically dispatching/converting on record size). Then once
> that is done invoke the command to tell nodes to start using the new
> record (perhaps also doing a few table transforms along the way).
>
> It can sometimes help to use the fact that records are also tagged
> tuples. Kind of ugly in the code, but could be isolated to a small
> number of places.
>
> Another option we have with some success is to have a single extension
> field in the record that holds a tagged tuple list. It is
> extraordinary how much such a structure can be abused ;-)
>
> Sean
>
> On 29 Oct 2007, at 17:26, David Mercer wrote:
>
> > While an Erlang system has the ability to update its program on the
> > fly, updating data structures on the fly seems a bit more
> > difficult.  Unless you can upgrade all nodes simultaneously, some
> > nodes will be expecting the old data structure while others then
> > new.  My question therefore, is how to structure my data?  Is there
> > an approach that I am missing that is both upgrade-friendly and ETS/
> > Mnesia-compatible?  Please see the following paragraphs for my
> > analysis so far.
> >
> > Suppose we are writing an inventory control application.  We decide
> > to create a record to contain our information about items in our
> > inventory.  Not much to say about items, really, so we're just going
> > to hold the item's name in a record.  If something else ever needs
> > to be tracked regarding these items, we can always upgrade our data,
> > right?
> >
> > -record(item,
> >        { name
> >        }).
> >
> > So we roll out our new inventory system to 3,000 nodes in our 25
> > warehouses in 6 different countries, and everything works
> > swimmingly.  For a while.
> >
> > However, some time later, our accounting department decides we need
> > a way to value our inventory, and each item should have a value
> > associated with it.  That way, we can calculate inventory value
> > simply by multiplying value by quantity at each location.
> > Unfortunately, we cannot now use our record structure.  What to do?
> >
> > Well, naïvely, we decide to just modify our item record.
> >
> > -record(item,
> >        { name
> >        , value
> >        }).
> >
> > This new record structure is incompatible with the old item record
> > structure, so we will also write some code that upgrades our items
> > in the system to the new structure when we upgrade the system.
> > Unfortunately, unless our entire worldwide operation is upgraded all
> > at once, any process using the old structure will crash when it
> > encounters a new-style item, and vice versa.  Simultaneous upgrading
> > all 3,000 nodes is impractical, so we'll have to rethink our
> > original decision.
> >
> > We could have created the original record structure with expansion
> > slots available for future use.
> >
> > -record(item,
> >        { name
> >        , 'X1'
> >        , 'X2'
> >        , 'X3'
> >        }).
> >
> > Now when Accounting wants us to add the value of the item to the
> > item record, we simply redefine one of the expansion slots.
> >
> > -record(item,
> >        { name
> >        , value
> >        , 'X2'
> >        , 'X3'
> >        }).
> >
> > This will not crash any process, since the size of resulting tuple
> > is still the same.  Unfortunately, we might run out of expansion
> > slots if we don't allocate enough of them.  The example runs out of
> > slots once Accounting also gets their cost-basis and GL-class
> > elements added, leaving us in the same boat as before.  We simply
> > delayed the inevitable.  We might get bright and allocate the new
> > slots hierarchically by department, for instance, so Accounting gets
> > only one slot for all of its information, and we define a new record
> > for the information in that slot.
> >
> > -record(item,
> >        { name
> >        , acctg
> >        , 'X2'
> >        , 'X3'
> >        }).
> > -record(acctg_item,
> >        { value
> >        , cost_basis
> >        , gl_class
> >        , 'X1'
> >        , 'X2'
> >        , 'X3'
> >        }).
> >
> > However, this approach once again only delays the inevitable.  When
> > Inventory Control and Manufacturing take up the other two expansion
> > slots, there is no room for Engineering's data.  Plus, we have
> > multiplied this problem, since it occurs for each of our subrecords,
> > which can also run out of expansion slots.
> >
> > Another alternative might be to have only one expansion slot, which
> > is filled in by the next version of the item record.
> >
> > -record(item,
> >        { name
> >        , v2
> >        }).
> > -record(item_v2,
> >        { value
> >        , cost_basis
> >        , gl_class
> >        , v3
> >        }).
> >
> > Now when we have more elements to add, we create an item_v3 record
> > (with a v4 element to accommodate future expansion), and so on.  The
> > problems with this, however, are that programmers need to know which
> > version of the record a certain data element is, and that by the
> > time we go through a few score enhancements and we're up to version
> > 68, it becomes quite cumbersome, and is little better than had we
> > used a linked list.
> >
> > In fact, a linked list may well be better.  Instead of writing
> > functions with the record syntax, we can use lists.
> >
> > item_value([item, _Name, Value | _])
> >        ->
> >                 Value
> >               .
> >
> > To retrieve the value, we only need to know its position in the
> > list.  This approach suffers from a couple of problems: (1) You need
> > to know the position of each element in the list; (2) This list will
> > be repeated quite frequently, so when you have 300 attributes your
> > code will be brittle, repetitive, and difficult to maintain.
> >
> > Perhaps an alternative approach is to define each record version
> > independently, instead of additively as we tried earlier.
> >
> > -record(item1,
> >        { name
> >        }).
> > -record(item2,
> >        { item
> >        , value
> >        , cost_basis
> >        , gl_class
> >        }).
> >
> > Now in our code, we have versions of each function matching on the
> > record structure, and a function that handles the no-match case (in
> > case you're running v2 code when you receive a v3 record).  Once
> > again, however, we run into a couple of obstacles: (1) We must
> > implement a different version of each function for each version of
> > the record (this will get tiresome around version 68); (2) new
> > versions are not backward compatible: a node running a previous
> > version of the code will not recognize future-versioned data
> > structures, even though the only fields it needs are those from its
> > own version.
> >
> > Let's borrow a page from object-oriented design principles.  Why not
> > let the item provide its own methods for data access through
> > functions contained on the structure.  We define a record "class"
> > which has two slots: one for the methods, and one for the data.  By
> > doing this, items carry around their own methods and so it doesn't
> > really matter what version of an item something is, so long as the
> > item knows how to use its own data.  First we define some
> > infrastructure.
> >
> > -module(class).
> > -export([invoke/3]).
> > -record(class,
> >        { methods
> >        , data
> >        }).
> > invoke(Method_ID, Object = #class{methods = Methods}, Args)
> >        ->
> >                 Method = Methods(Method_ID),
> >                 Method(Object, Args)
> >               .
> >
> > To call a method on an object, syntax is simply "invoke(Method_ID,
> > Object, Args)", such as
> >
> > X = item:new ("X"),  % Create a new item "X"
> > X_Name = class:invoke(get_name, X, []),  % Returns "X"
> > Y = class:invoke(set_name, X, ["Y"]).  % Changes item name
> >
> > This is great for encapsulation!  The implementation is
> > straightforward.
> >
> > -module(item).
> > -export([new/1]).
> > -include("class.hrl").
> > -record(item,
> >        { name
> >        }).
> >
> > new(Name)
> >        ->
> >                 #class{ methods = fun(get_name) -> fun get_name/2
> >                                   ;  (set_name) -> fun set_name/2
> >                                   end
> >                       , data    = #item{ name = Name }
> >                       }
> >               .
> >
> > get_name(#class{data = #item{name = Name}}, _)
> >        ->
> >                 Name
> >               .
> >
> > set_name(Object = #class{data = Item}, [Name])
> >        ->
> >                 Object#class{data = Item#item{name = Name}}
> >               .
> >
> > Alas, there is a fly in this ointment, too.  While it would appear
> > that the method functions are being carried around along with the
> > data (in fact, the item tuple is "{class,#Fun<item.0.96410792>,
> > {item,"X"}}"), those functions are really not carried around from
> > node to node.  Instead, Erlang only carries around references to the
> > functions.  This means if this item shows up on a node where the
> > function does not exist, an error will occur when a method is invoked.
> >
> > The fact that you cannot safely sling functions around with your
> > data from node to node indicates that perhaps we need a very simple
> > interface with functions that will never change.  Maybe instead of
> > using records at all, we can use basic OTP library functions to
> > associate item properties with their values.  Sounds kind of like
> > what proplists were designed for.
> >
> > X = [{name, "X"}],  % Create a new item "X"
> > X_Name = proplists:get_value(name, X),  % Returns "X"
> > Y = [{name, "Y"} | proplists:delete(name, X1)].  % Changes item name
> >
> > A similar effect can be had with dicts, with the decision probably
> > to be made based on performance.  (Not only that, but the decision
> > can be made dynamically at run-time, since there are functions for
> > converting between the two.)
> >
> > X = dict:from_list([{name, "X"}]),  % Create a new item "X"
> > X_Name = dict:fetch(name, X),  % Returns "X"
> > Y = dict:store(name, "Y", X).  % Changes item name
> >
> > This approach has the advantage of being completely backward-
> > compatible with respect to my code-base.  Should a later version of
> > our inventory application add a property, it will not change the
> > operation of any previous version.  Once again, however, there are
> > problems with this approach: (1) property values cannot be used for
> > matching in function definitions; (2) these structures are not
> > easily indexed: ETS and Mnesia require record data types.  While
> > Disadvantage 1 might be easily managed by performing lookups and
> > conditionals within the function, Disadvantage 2 is probably
> > intractable.
> >
> > To repeat my question, gentle readers, how ought I structure my
> > data?  Is there an approach that I am missing that is both upgrade-
> > friendly and ETS/Mnesia-compatible?
> >
> > Thank-you.
> >
> > Cheers,
> >
> > David
> >
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://www.erlang.org/mailman/listinfo/erlang-questions
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071030/42ba69a0/attachment.htm>


More information about the erlang-questions mailing list