[erlang-questions] gen_leader discrepancies in reporting of downed nodes across a cluster

Jeremy Raymond jeraymond@REDACTED
Fri Nov 30 03:12:33 CET 2012


+ list

--
Jeremy

On 2012-11-29, at 9:10 PM, Jeremy Raymond <jeraymond@REDACTED> wrote:

> Interesting. The API looks a bit different though, not a drop in
> replacement for gen_leader. I'll give it a go if gen_leader isn't able
> to be sorted out.
>
> --
> Jeremy
>
> On 2012-11-29, at 8:21 PM, Geoff Cant <nem@REDACTED> wrote:
>
>> Hi there - are your tests for this stuff automated? I have an alternative gen_leader implementation ( https://github.com/ngmoco/gl_async_bully ) and I would like to compare it to the original to see how it stacks up.
>>
>> Cheers,
>> -Geoff
>>
>> Begin forwarded message:
>>
>>> From: Jeremy Raymond <jeraymond@REDACTED>
>>> Subject: Re: [erlang-questions] gen_leader discrepancies in reporting of downed nodes across a cluster
>>> Date: 30 November 2012 05:24:49 AM
>>> To: Erlang <erlang-questions@REDACTED>
>>> Return-Path: <erlang-questions-bounces@REDACTED>
>>> Delivered-To: nem@REDACTED
>>> Delivered-To: erlang-questions@REDACTED
>>>
>>> I gave that branch a try. I'm still seeing misreported downed nodes. I
>>> should see correct gen_leader:down/1 and gen_leader:alive/1 lists on all
>>> nodes correct?
>>>
>>> --
>>> Jeremy
>>>
>>>
>>> On Tue, Nov 27, 2012 at 11:35 PM, Andrew Thompson <andrew@REDACTED>wrote:
>>>
>>>> On Tue, Nov 27, 2012 at 12:47:52PM -0500, Jeremy Raymond wrote:
>>>>> Hi,
>>>>>
>>>>> I'm using the gen_leader behaviour from [1] in a 3 node Erlang cluster.
>>>> I'm
>>>>> running into a situation where if I down one of the nodes and bring it
>>>> back
>>>>> up, when it rejoins the cluster the other nodes still see it as being
>>>> down
>>>>> as reported by gen_leader:down/1. However the cycled node itself sees the
>>>>> other two nodes as being up. If I cycle the other two nodes, then all
>>>> three
>>>>> will agree again on all of the nodes being available. This doesn't happen
>>>>> all every time I down a node, but quite often. Another (related?) issue I
>>>>> sometimes see is that gen_leader:down/1 sometimes reports the same node
>>>> as
>>>>> being down multiple times in the returned list.
>>>>
>>>> Would you mind trying the branch at
>>>>
>>>> https://github.com/Vagabond/gen_leader_revival/tree/netsplit-tolerance
>>>>
>>>> This branch contains a bunch of work I did to work around these kinds
>>>> of issues that Basho was seeing with gen_leader.
>>>>
>>>> Anfrew
>>>> _______________________________________________
>>>> erlang-questions mailing list
>>>> erlang-questions@REDACTED
>>>> http://erlang.org/mailman/listinfo/erlang-questions
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>> --
>> Geoff Cant
> <smime.p7s>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2324 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20121129/31f24a5a/attachment.bin>


More information about the erlang-questions mailing list