[erlang-questions] two epmds running

Nicolas Charpentier nc@REDACTED
Wed Mar 17 11:10:14 CET 2010


Hi
If the problem only occurs when two nodes start at the same time, you  
can start epmd before any nodes.
If you are running Linux you can add a init script to start epmd and  
ensure that other init script are run after epmd.

Nicolas

On Mar 17, 2010, at 2:07, "Joseph Wayne Norton" <norton@REDACTED>  
wrote:

>
> We have faced the same behavior described by Bob.  The problem  
> occurs only when rebooting a server that has two or more Erlang  
> virtual machines started by init.  The problem when it happens can  
> easily consume a significant amount of disk space in the /var/log  
> directory by epmd's error logging.   It is unknown how to directly  
> trigger the problem.
>
>
> On Wed, 17 Mar 2010 02:13:54 +0900, Bob Ippolito <bob@REDACTED>  
> wrote:
>
>> On Tue, Mar 16, 2010 at 9:10 AM, Garrett Smith <g@REDACTED> wrote:
>>> On Tue, Mar 16, 2010 at 9:36 AM, Bob Ippolito <bob@REDACTED>  
>>> wrote:
>>>> On Tue, Mar 16, 2010 at 2:23 AM, Anthony Shipman  
>>>> <als@REDACTED> wrote:
>>>>> Sometimes it happens that I discover two epmd processes running.  
>>>>> One of
>>>>> them is in a tight loop consuming 100% of CPU time. My guess is  
>>>>> that the
>>>>> second one is started automatically because the first one is no  
>>>>> longer
>>>>> responding. Is this a known bug in epmd?
>>>>
>>>> I think we have seen this before, one of them is probably violently
>>>> logging "epmd: epmd: error in accept" as well. We have only seen  
>>>> this
>>>> on boot-up of a machine, probably due to several Erlang VMs  
>>>> trying to
>>>> start up at the same time. We don't currently have a solution for  
>>>> this
>>>> issue (mostly because we don't know the root cause yet).
>>>>
>>>> I am not sure we get two of them, it might be just one in our case.
>>>
>>> I haven't seen two running, but I've seen none running, which is a
>>> real bummer. I've written a monitor process (probably gen_fsm based)
>>> that keeps an eye on epmd and starts it and reinitializes it when it
>>> goes away. A properly functioning epmd is important enough that you
>>> might consider something similar to ensure that, in your case, that
>>> rogue process is dealt with (killed?).
>>>
>>> I suppose that's somewhat flippant -- to say write your own monitor
>>> for this, but losing epmd is like losing your network and people  
>>> go to
>>> great lengths to keep networks up.
>>
>> Yeah absolutely it needs to be killed when it's in that state. It  
>> eats
>> up a lot of CPU, spews endless crap to syslog, and breaks erlang
>> distribution on that node. We haven't seen it often enough to feel  
>> too
>> much pain yet but it's something on our roadmap to try and reproduce
>> and fix or work around it.
>>
>> When we kill it we also bring down all of the applications on that
>> node, which sucks because we can't shut them down cleanly since doing
>> that (at least by the means that our tools know how) depends on epmd
>> being up. Fortunately we have only seen this happen just after a
>> reboot.
>>
>> -bob
>>
>> ________________________________________________________________
>> erlang-questions (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>>
>
>
> -- 
> norton@REDACTED
>
> ________________________________________________________________
> erlang-questions (at) erlang.org mailing list.
> See http://www.erlang.org/faq.html
> To unsubscribe; mailto:erlang-questions-unsubscribe@REDACTED
>


More information about the erlang-questions mailing list