[erlang-bugs] Erlang nodes fail to communicate if user has no capability to change SO_PRIORITY socket options.
pan@REDACTED
pan@REDACTED
Tue Jan 18 17:15:09 CET 2011
Hi!
On Mon, 17 Jan 2011, Serge Aleynikov wrote:
> At some point I was having a similar issue with SO_PRIORITY and SO_TOS bug in
> inet_drv when trying to open a unix domain socket and pass the open file
> descriptor to gen_tcp, which failed to function properly. After discussing
> this issue with Per Hedeland he sent me the attached patch that worked well
> to solve the issue. Perhaps it will also work in your case, and if so, it
> should be included in distribution.
Seems Per's patch covers more cases - if this solves the problem, it seems
like the best choice to take this one int dev instead of the smaller fix
i presented earlier.
Janek - have you tried this one?
Cheers,
/Patrik
>
> Serge
>
> On 1/17/2011 1:33 PM, Janek Wrobel wrote:
>> On Thu, Jan 13, 2011 at 5:22 PM, Patrik Nyblom<pan@REDACTED>
>> wrote:
>>> Hi!
>>>
>>> On Thu, 13 Jan 2011, Janek Wrobel wrote:
>>>
>>>> Hi,
>>>>
>>>> When trying to setup an Erlang cluster I was getting following error
>>>> while spawning a function on a remote node:
>>>>
>>>> =ERROR REPORT==== 13-Jan-2011::01:00:38 ===
>>>> ** Can not start hello_world:ping,[] on 'node3@REDACTED' **
>>>>
>>>> After some investigation it turned out that nodes did not accept TCP
>>>> connections, because setting SO_PRIORITY socket option failed. Strace
>>>> follows:
>>>
>>> What kind of node (kernel version, extra options etc) do you have?
>>
>> Sorry for a late reply, but I was trying to investigate what
>> configuration option is responsible for this behavior and how to
>> reproduce it on a standard Linux box. Unfortunately without any
>> success. The problem here is that socket gets created with default
>> priority larger then user running the process can
>> set with setsockopt, so the sequence of getsockopt(SO_PRIORITY,
>> &priority), setsockopt(SO_PRIORITY, priority) fails.
>>
>> I was thinking that maybe some firewall rule increasing TOS of packets
>> directed and coming from a given TCP port, or some traffic shaping
>> rule ('tc' command) can have an effect of changing default priority of
>> sockets associated with the port. It does not seem to be the case.
>> Maybe someone on this list knows if default priority of Linux sockets
>> can be somehow altered?
>>
>> One scenario in which the sequence of getsocopt(), setsockopt() can
>> fail is when socket was created by a different OS process that had
>> CAP_NET_ADMIN capability. Socket descriptor can be then passed to
>> Erlang VM running without CAP_NET_ADMIN, and used in 'listen {fd, Fd}'
>> function, causing similar error. But this is definitely not the case
>> here.
>>
>>
>>> The inet_driver has a workaround for SO_PRIORITY being destroyed by SO_TOS
>>> settings, I think that's where this fails.
>>>
>>>>
>>>> accept(7, {sa_family=AF_INET, sin_port=htons(51602),
>>>> sin_addr=inet_addr("123.123.123.123")}, [16]) = 10
>>>> fcntl(10, F_GETFL) = 0x2 (flags O_RDWR)
>>>> fcntl(10, F_SETFL, O_RDWR|O_NONBLOCK) = 0
>>>> getsockopt(7, SOL_TCP, TCP_NODELAY, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_SOCKET, SO_KEEPALIVE, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_SOCKET, SO_PRIORITY, [29220483580821504], [4]) = 0
>>>> getsockopt(7, SOL_IP, IP_TOS, [29220483580821504], [4]) = 0
>>>> getsockopt(10, SOL_SOCKET, SO_PRIORITY, [-4294967281], [4]) = 0
>>>> getsockopt(10, SOL_IP, IP_TOS, [64424509440], [4]) = 0
>>>> setsockopt(10, SOL_IP, IP_TOS, [0], 4) = 0
>>>> setsockopt(10, SOL_SOCKET, SO_PRIORITY, [15], 4) = -1 EPERM (Operation
>>>> not permitted)
>>>> close(10)
>>>>
>>>> To make it working I needed to add '#undef SO_PRIORITY' to
>>>> erts/emulator/drivers/common/inet_drv.c and recompile.
>>>>
>>>> Can errors from setsockopt(..., SO_PRIORITY) be ignored? According to
>>>> the socket(7) manual, it is normal for a user not to be able to change
>>>> this option ('Setting a priority outside the range 0 to 6 requires the
>>>> CAP_NET_ADMIN capability.').
>>>
>>>
>>> I think it would be OK if you checked that you got EPERM in the exact
>>> copy-from-listen-socket-to-result-of-accept code and ignored the result
>>> then.
>>>
>>> I suspect you would have to patch the function setopt_prio_tos_trick in
>>> inet_drv.c like this:
>>> -----------------------------
>>> diff --combined erts/emulator/drivers/common/inet_drv.c
>>> index 818bc63,818bc63..0000000
>>> --- a/erts/emulator/drivers/common/inet_drv.c
>>> +++ b/erts/emulator/drivers/common/inet_drv.c
>>> @@@ -5095,6 -5095,6 +5095,9 @@@ static int setopt_prio_tos_tric
>>> SO_PRIORITY,
>>> (char *)&tmp_ival_prio,
>>> tmp_arg_sz_prio);
>>> ++ if (res != 0&& sock_errno() == EPERM) {
>>> ++ res = 0;
>>> ++ }
>>> }
>>> }
>>> }
>>> -------------------------------
>>>
>>> Try that and see if it fixes the problem.
>>
>> This fixes the problem.
>>
>> thanks,
>> Janek
>>
>> ________________________________________________________________
>> erlang-bugs (at) erlang.org mailing list.
>> See http://www.erlang.org/faq.html
>> To unsubscribe; mailto:erlang-bugs-unsubscribe@REDACTED
>>
>
More information about the erlang-bugs
mailing list