[erlang-questions] upgrading from R12 to R13 in a heterogeneous fashion

Brian Acton acton@REDACTED
Thu Mar 25 23:33:05 CET 2010


Hi guys,

On Friday night, we added R13 SMP mnesia nodes into our R12 cluster.
Everything worked fairly well.

On Sunday afternoon, we removed the R12 nodes from the cluster. We used the
following commands:
%% on R12 node:
mnesia:stop().
mnesia:delete_schema([node()]).
%% on R13 node
mnesia:del_table_copy(schema, R12Node).

We immediately started having transactional coherence problems. More
importantly, we didn't detect until 24 hours later :( For the record, our
application performs the following meta operations:
% process A
F = fun() -> mnesia:write(Id).
mnesia:transaction(F)
B ! Id
% process B
receive Id -> mnesia:dirty_read(Id).

The problem is that process B started returning [] empty results when before
it would return non zero results.

Thinking that this was just a weird R12/R13 conversion bug, we rebooted our
R13 nodes and everything returned to normal.

Unfortunately, the problem resurfaced over night. We don't see any trigger
in the erlang logs, it is simply that mnesia starts returning empty results.
We do start to see a backlog of messages of process B and eventually the
problem cascades and all the nodes effectively become poisoned returning
empty results.

Does anyone have any flashes of insight on this? I tried to condense the
problem as simply as possible. Really the only change that we made is
conversion from R12 to R13+SMP.

Hope this rings some bells. If anyone wants to discuss this face to face,
I'm at Erlang Factory SF today / tomorrow.

--b


On Tue, Feb 16, 2010 at 11:10 PM, Kenneth Lundin
<kenneth.lundin@REDACTED>wrote:

> The compatibility between major releases is intended just for the case
> when a cluster
> is upgraded in service node by node.
> It will probably work well for many applications to run a
> heterogeneous cluster in
> steady state as well but the general recommendation is to upgrade to
> the same version of Erlang
> on every node as fast as possible.
> This applies if you are using the Erlang distribution between the nodes.
> If you have invented
> your own communication between the nodes it is up to your solution if
> it is important to have
> the same version of Erlang on every node.
>
> /Kenneth Erlang/OTP, Ericsson
>
>
> On Wed, Feb 17, 2010 at 5:08 AM, Brian Acton <acton@REDACTED> wrote:
> > Hi guys,
> >
> > We are currently running an 8 node cluster running R12. We are wanting to
> > get all the bug fixes and performance improvements from R13 (not to
> mention
> > better support). We've been advised in the past that we shouldn't run a
> > heterogeneous cluster with a mix of R12 and R13 as it is potentially
> > unstable.
> >
> > I'm wondering if anyone has any advice on the matter and if it is OK for
> us
> > to run a heterogeneous environment (even if it is for a short duration
> (like
> > 24 hours) ). I'm also wondering if there are any specific gotchas or
> caveats
> > that we should be aware of as we go down this path.
> >
> > Thanks!
> >
> > --b
> >
>


More information about the erlang-questions mailing list