[erlang-questions] Latent bugs in Erlang/OTP

Niclas Eklund nick@REDACTED
Mon Jan 15 12:27:15 CET 2007


Hello!

Actually, that function behaves exactly as intended. For those of you not
familiar with Orber you should know that this is an internal function,
which may not be used by other applications. 
Let's begin with the comment (i.e. argument datatypes), which defines what
other modules may pass but that doesn't mean that the orber_tb module may
not use other values internally. Hence, it's consistent.
If the user may define, for example, the Timeout parameter the value check
will be performed in the module handling environment parameters
(orber_env.erl). From my point of view it's rather pointless to check the
value in each place it is used if one can achieve the same thing in one
single place.

Why -1? Previously mnesia:wait_for_tables(Tables, infinity) was
always used. Hence, to be fully backward compatible but still be able to
inform the user that there is a problem loading Mnesia tables. Using -1
prevents that the following clause never match:

wait_for_tables(Tables, _Timeout, 0) ->
     error("Mnesia failed to load the some or all of the following"
           "tables:~n~p", [Tables]),
     {error, "The requested Mnesia tables not yet available."};

I.e. the same behavior as for mnesia:wait_for_tables(Tables, infinity).
But why isn't orber_tb:wait_for_tables/3 exported? When the user will be
allowed to define this parameter it will be.

You wrote:

 <quote>
  3. This -1 value in conjunction with the fact that there is no
     guarantee that Timeout is an integer is a serious bug waiting to
     happen.  Imagine the effect that a call
               
      wait_for_tables(SomeTables, infinity)
      
     will have to the system.  It will throw it to an almost endless   
     loop which, thanks to the fact that Erlang has bignum arithmetic,
     will first consume all available heap and then crash the node
     running this code.  This is a bug that is very difficult to
     discover by testing.
 </quote>

If the Timeout would be set to infinity, mnesia:wait_for_tables/2 will not
return until all tables are loaded or an error is detected, which aborts
orber_tb:wait_for_tables as well (i.e. invoked at most once).

Best regards,

Nick


On Thu, 11 Jan 2007, Kostis Sagonas wrote:

> Partly due to extending Dialyzer and partly wanting to satisfying my 
> scientific curiosity, I've been spending a significant portion of my 
> recent time looking for bugs in Erlang code, mainly in OTP.
> 
> More importantly, I am trying to get a grasp on the reasons why even the 
> most competent Erlang programmers write code that contains latent bugs. 
> This, in the hope that this knowledge can be fed back into Dialyzer or 
> some other similar tool.  Still, the things I've run across so far are 
> interesting -- perhaps even worth of a talk at some forum.
> 
> 
> Let me share with you one of the latest ones -- discovered by the 
> development version of Dialyzer which, among other things, infers 
> integer ranges.
> 
> The beginning of lib/orber/src/orber_tb.erl in R11B-2 reads:
> 
> %%----------------------------------------------------------------------
> -module(orber_tb).
> ...
> -export([wait_for_tables/1, wait_for_tables/2,
> 	 is_loaded/0, is_loaded/1, is_running/0, is_running/1,
> 	 info/2, error/2, unique/1]).
> %%----------------------------------------------------------------------
> 
> Note that two wait_for_tables functions are exported, those with arities 
> 1 and 2.  These functions are defined as follows:
> 
> %%----------------------------------------------------------------------
> %% function : wait_for_tables/1
> %% Arguments: Tables  - list of mnesia tables
> %%            Timeout - integer (no point in allowing infinity)
> %%            Attempts - integer > 0 How many times should we try
> %% Returns  :
> %% Exception:
> %% Effect   :
> %%----------------------------------------------------------------------
> wait_for_tables(Tables) ->
>      wait_for_tables(Tables, 30000, -1).
> 
> wait_for_tables(Tables, Timeout) ->
>      wait_for_tables(Tables, Timeout, -1).
> 
> wait_for_tables(Tables, _Timeout, 0) ->
>      error("Mnesia failed to load the some or all of the following"
> 	  "tables:~n~p", [Tables]),
>      {error, "The requested Mnesia tables not yet available."};
> wait_for_tables(Tables, Timeout, Attempts) ->
>      case mnesia:wait_for_tables(Tables, Timeout) of
> 	ok ->
> 	    ok;
> 	{timeout, BadTabList} ->
> 	    info("Mnesia hasn't loaded the following tables (~p msec):~n~p",
> 		 [Timeout, BadTabList]),
> 	    wait_for_tables(BadTabList, Timeout, Attempts-1);
> 	{error, Reason} ->
> 	    error("Mnesia failed to load the some or all of the following"
> 		  "tables:~n~p", [Tables]),
> 	    {error, Reason}
>      end.
> %%----------------------------------------------------------------------
> 
> Notice the comments above the functions. Obviously, they refer to all 
> three functions, not only wait_for_tables/1.  They indicate that the 
> programmer intended that the second argument of the wait_for_tables/2 
> function (the timeout) is an integer, but note that this intention is 
> nowhere reflected in the code (e.g. in the form of an is_integer/1 guard).
> 
> The third argument of wait_for_tables/3 is even more interesting. 
> (Reasons in increasing order of importance.)
> 
>    1. Its "starting" value (-1) is _inconsistent_ with the documentation
> 
>    2. This value has _NO effect whatsoever_ to the intention of the
>       programmer of counting the number of attempts.
>       (Recall that this is a non-exported function, thus a positive value
>        of "Attempts" will never be supplied for this function.)
> 
>    3. This -1 value in conjunction with the fact that there is no
>       guarantee that Timeout is an integer is a serious bug waiting to
>       happen.  Imagine the effect that a call
> 
> 	 wait_for_tables(SomeTables, infinity)
> 
>       will have to the system.  It will throw it to an almost endless
>       loop which, thanks to the fact that Erlang has bignum arithmetic,
>       will first consume all available heap and then crash the node
>       running this code.  This is a bug that is very difficult to
>       discover by testing.
> 
> 
> Now, all this is currently identified by the *development* version of 
> Dialyzer as:
> 
>      {orber_tb,wait_for_tables,3}:
> 	Type guard {integer,0} will always fail
> 	since variable has type neg_integer()!
> 
> which, admittedly, does not explain to the programmer the problem in
> exactly the same way that I did ;-).  Instead, it simply points out
> that the first clause of this function is just dead code.
> 
> There was a relatively recent comment to this mailing list by Mats 
> Cronqvist:
>      <quote>
> 	running dialyzer on well-tested code will turn up tons of
> 	errors. alas, that almost always turns out to be dead code.
>      </quote>
> 
> I guess one point I am implicitly trying to make here is that even dead 
> code might not be so harmless as its "dead" status might suggest, and 
> worth taking a closer look. Typically, competent programmers do not 
> write dead code.
> 
> 
> Feedback is welcome,
> 
> Kostis
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
> 








More information about the erlang-questions mailing list