[erlang-questions] Latent bugs in Erlang/OTP
Kostis Sagonas
kostis@REDACTED
Thu Jan 11 17:53:23 CET 2007
Partly due to extending Dialyzer and partly wanting to satisfying my
scientific curiosity, I've been spending a significant portion of my
recent time looking for bugs in Erlang code, mainly in OTP.
More importantly, I am trying to get a grasp on the reasons why even the
most competent Erlang programmers write code that contains latent bugs.
This, in the hope that this knowledge can be fed back into Dialyzer or
some other similar tool. Still, the things I've run across so far are
interesting -- perhaps even worth of a talk at some forum.
Let me share with you one of the latest ones -- discovered by the
development version of Dialyzer which, among other things, infers
integer ranges.
The beginning of lib/orber/src/orber_tb.erl in R11B-2 reads:
%%----------------------------------------------------------------------
-module(orber_tb).
...
-export([wait_for_tables/1, wait_for_tables/2,
is_loaded/0, is_loaded/1, is_running/0, is_running/1,
info/2, error/2, unique/1]).
%%----------------------------------------------------------------------
Note that two wait_for_tables functions are exported, those with arities
1 and 2. These functions are defined as follows:
%%----------------------------------------------------------------------
%% function : wait_for_tables/1
%% Arguments: Tables - list of mnesia tables
%% Timeout - integer (no point in allowing infinity)
%% Attempts - integer > 0 How many times should we try
%% Returns :
%% Exception:
%% Effect :
%%----------------------------------------------------------------------
wait_for_tables(Tables) ->
wait_for_tables(Tables, 30000, -1).
wait_for_tables(Tables, Timeout) ->
wait_for_tables(Tables, Timeout, -1).
wait_for_tables(Tables, _Timeout, 0) ->
error("Mnesia failed to load the some or all of the following"
"tables:~n~p", [Tables]),
{error, "The requested Mnesia tables not yet available."};
wait_for_tables(Tables, Timeout, Attempts) ->
case mnesia:wait_for_tables(Tables, Timeout) of
ok ->
ok;
{timeout, BadTabList} ->
info("Mnesia hasn't loaded the following tables (~p msec):~n~p",
[Timeout, BadTabList]),
wait_for_tables(BadTabList, Timeout, Attempts-1);
{error, Reason} ->
error("Mnesia failed to load the some or all of the following"
"tables:~n~p", [Tables]),
{error, Reason}
end.
%%----------------------------------------------------------------------
Notice the comments above the functions. Obviously, they refer to all
three functions, not only wait_for_tables/1. They indicate that the
programmer intended that the second argument of the wait_for_tables/2
function (the timeout) is an integer, but note that this intention is
nowhere reflected in the code (e.g. in the form of an is_integer/1 guard).
The third argument of wait_for_tables/3 is even more interesting.
(Reasons in increasing order of importance.)
1. Its "starting" value (-1) is _inconsistent_ with the documentation
2. This value has _NO effect whatsoever_ to the intention of the
programmer of counting the number of attempts.
(Recall that this is a non-exported function, thus a positive value
of "Attempts" will never be supplied for this function.)
3. This -1 value in conjunction with the fact that there is no
guarantee that Timeout is an integer is a serious bug waiting to
happen. Imagine the effect that a call
wait_for_tables(SomeTables, infinity)
will have to the system. It will throw it to an almost endless
loop which, thanks to the fact that Erlang has bignum arithmetic,
will first consume all available heap and then crash the node
running this code. This is a bug that is very difficult to
discover by testing.
Now, all this is currently identified by the *development* version of
Dialyzer as:
{orber_tb,wait_for_tables,3}:
Type guard {integer,0} will always fail
since variable has type neg_integer()!
which, admittedly, does not explain to the programmer the problem in
exactly the same way that I did ;-). Instead, it simply points out
that the first clause of this function is just dead code.
There was a relatively recent comment to this mailing list by Mats
Cronqvist:
<quote>
running dialyzer on well-tested code will turn up tons of
errors. alas, that almost always turns out to be dead code.
</quote>
I guess one point I am implicitly trying to make here is that even dead
code might not be so harmless as its "dead" status might suggest, and
worth taking a closer look. Typically, competent programmers do not
write dead code.
Feedback is welcome,
Kostis
More information about the erlang-questions
mailing list