[erlang-bugs] Bug in the resolver?

Raimo Niskanen raimo+erlang-bugs@REDACTED
Thu Apr 14 14:34:20 CEST 2011


On Tue, Apr 12, 2011 at 01:46:05PM +0200, Raimo Niskanen wrote:
> On Tue, Apr 12, 2011 at 09:13:10PM +1000, Evgeniy Khramtsov wrote:
> > 12.04.2011 20:30, Raimo Niskanen wrote:
> > >
> > >You must have called inet_res:getbyname(Name, Type, infinity),
> > >and that was apparently not tested. The functions that calculate
> > >the remaining time for do_udp_recv/5 are not written for a timeout of
> > >'infinity' and crash for the subtraction of Now - 'undefined'.
> > >   
> > 
> > Strange. There is inet_res:getbyname(String, srv, 10000) actually.
> 
> Sorry, I misread the condition in the code. To get to where your stacktrace
> tells me the value of Timeout to inet_res:do_udp_receive/5 must be 0.
> 
> Then it seems the code accidentally loops exactly when 0 milliseconds
> remain to wait for the whole user interface timeout. If a lower level
> timeout of 5 seconds (which sounds familiar) is involved, then
> two such UDP timeouts could make the code loop after exactly
> 10 seconds and get a rest timeout of 0 ms.
> 
> Try a timeout value of 11111 ms instead.

That was rubbish. A long enough timeout seems to be necessary.

> 
> If this guess is correct the bug is more serious than I first assumed.

New findings
============

There are two timeout values involved, plus a retry limit.

The UDP query timeout values are 2 s and 3 retries.

The 3:rd argument to inet_res:getbyname/3 is a timeout limit for that call.
It sets an upper limit to the UDP query timeout and retry procedure.

If you do not use that 3:rd argument, or set it to 'infinity', the call will
timeout anyway after all queries have timed out. They will timeout
as follows, from the man page for inet_res:

    For  UDP  queries, the resolver options timeout and retry control
    retransmission. Each nameserver in the nameservers list is tried with
    a timeout of timeout / retry. Then all nameservers are tried again
    doubling the timeout, for a total of retry times.

So, for default values for UDP query timeouts, it will take
(666 + 1333 + 2666) = 4665 ms per nameserver for the whole call to timeout.
If any servers are unreachable (ECONNREFUSED, ENETUNREACH) this will
decrease the time since such a server is discarded after the first failure.
If inet_res has to retry with TCP the time might increase since a timeout
value of 5 * (UDP query timeout) is used for every TCP query.

Anyway, if you use a 3:rd argument to inet_res that forces it to cut
the last UDP query to timeout 0, there is a bug that is triggered
by an incoming UDP reply at that late time.

Example: For 3 nameservers a 3:rd argument timeout of less than
3 * (666 + 1333) + (3 - 1) * 2666 = 11329 ms combined with an UDP reply
arriving so late it is received by the last gen_udp:recv, with timeout 0,
will trigger this bug.

I have not yet managed to reproduce it and am not sure it is possible
with certanity, so this conclusion still might be wrong.

Since it seems to be possible to avoid the bug with a long enough
timeout value it is not very serious. I am nevertheless rewriting
the code and fix the bug to become more confident that it works.

> 
> > 
> > -- 
> > Regards,
> > Evgeniy Khramtsov, ProcessOne.
> > xmpp:xram@REDACTED
> > 
> > _______________________________________________
> > erlang-bugs mailing list
> > erlang-bugs@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-bugs
> 
> -- 
> 
> / Raimo Niskanen, Erlang/OTP, Ericsson AB
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB



More information about the erlang-bugs mailing list