Author: Björn Gustavsson <bjorn(at)erlang(dot)org>
Status: Draft
Type: Standards Track
Created: 14-Sep-2020
Erlang-Version: 24
Post-History: 14-Oct-2020, 27-Nov-2020

EEP 54: Provide more information about errors

Abstract

This EEP proposes a mechanism for reporting more human-readable information about what went wrong when a BIF raises an exception. The same mechanism can be used by libraries or applications to provide more detailed error messages.

Specification

In OTP 23 and earlier, the shell prints a terse message when a call to a built-in function (BIF) fails:

1> element(a,b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)

The bad argument message informs us that one or more of the arguments to the call were incorrect in some way (in this example, both arguments have wrong types).

We propose a mechanism that enables the shell to print more helpful error messages. Here is how the message would be printed with the reference implementation of this EEP:

1> element(a, b).
** exception error: bad argument
     in function  element/2
        called as element(a,b)
        *** argument 1: not an integer
        *** argument 2: not a tuple

Note that the exact formatting and phrasing of the messages is an implementation detail outside the scope of this EEP. What will be specified here are the APIs and conventions that make these messages possible.

Proposals in this EEP

Extending the stacktrace

The stack back-trace (stacktrace) is currently a list of tuples. For the purpose of this EEP we are only interested in the first entry in the stacktrace. It looks like {Module,Function,Arguments,ExtraInfo}, where ExtraInfo is a list of two-tuples. As an indication that extended error info is available, we propose adding an {error_info,ErrorInfoMap} tuple to ExtraInfo in the first element in the stacktrace.

The map ErrorInfoMap may contain further information about the error or hints on how to handle the error.

Currently, three optional keys have a defined meaning:

To obtain more information about the error, the function named by the values of the module and function keys can be called. Hereafter in this document, for brevity we will call that function format_error/2.

The arguments for format_error/2 are the exception reason (usually badarg for BIF calls) and the stacktrace.

Thus, if a call to element/2 fails with a badarg exception and the first entry in the stacktrace is:

{erlang,element,[1,no_tuple],[{error_info,ErrorInfoMap}]}

and assuming that the stacktrace is bound to the variable StackTrace, the following call will provide provide additional information about the error:

FormatModule = maps:get(module, ErrorInfoMap, erlang),
FormatFunction = maps:get(function, ErrorInfoMap, format_error),
FormatModule:FormatError(badarg, StackTrace)

The format_error/2 function should return a map. For each argument that was in error, there should be a map element with the argument number as the key (that is, 1 for the first argument, 2 for the second, and so on) and a unicode:chardata() term as the value.

The atoms general and reason can also be returned in the map. general represents a generic error that is not attributed for a specific argument (for example badarg of io:format("Hello") when the default device is dead). reason will tell the error pretty printer to print the returned string instead of the error reason. The value pointed to by general and reason should be a unicode:chardata() term.

As an example:

Args = [1,no_tuple],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, StackTrace)

could return:

#{2 => <<"not a tuple">>}

And:

Args = [0, b],
StackTrace = [{erlang, element, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

could return:

#{1 => <<"out of range">>, 2 => <<"not a tuple">>}

And:

Args = ["Hello"],
StackTrace = [{io, format, Args, [{error_info,Map}]}],
erlang:format_error(badarg, Entry)

could return:

#{general => "the device has terminated"}

Note that the value for the key cause, if present, in the ErrorInfoMap term is only to be used by format_error/2. The actual value for a particular error could change at any time.

The cause key will typically only be present when an error occurs in a BIF that depends on the internal state in the runtime system (such as register/2 or the ETS BIFs), or for BIFs with complex arguments (such as system_flag/2) that would make it tedious and error prone to figure out which argument was in error.

Here is one way that format_error/2 for the erlang module could be implemented:

format_error(ExceptionReason, [{erlang, F, As, Info} | _]) ->
    ErrorInfoMap = proplists:get_value(error_info, Info, #{}),
    Cause = maps:get(cause, ErrorInfoMap, none),
    do_format_error(F, As, ExceptionReason, Cause).

do_format_error(_, _, system_limit, _) ->
    %% The explanation for system_limit is clear enough, so we don't
    %% need any detailed explanations for the arguments.
    #{};
do_format_error(F, As, _, Cause) ->
    do_format_error(F, As, Cause).

do_format_error(element, [Index, Tuple], _) ->
    Arg1 = if
               not is_integer(Index) ->
                   <<"not an integer">>;
               Index =< 0; Index > tuple_size(Tuple) ->
                   <<"out of range">>;
               true ->
                   []
           end,
    Arg2 = if
               not is_tuple(Tuple) -> <<"not a tuple">>;
               true -> []
           end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);

do_format_error(list_to_atom, _, _) ->
    #{1 => <<"not a flat list of characters">>};

do_format_error(register, [Name,PidOrPort], Cause) ->
    [Arg1, Arg2] =
    case Cause of
        registered_name ->
            [[],<<"this process or port already has a name">>];
        notalive ->
            [[],<<"the pid does not refer to an existing process">>];
        _ ->
            Errors =
                [if
                     Name =:= undefined -> <<"'undefined' is not a valid name">>;
                     is_atom(Name) -> [];
                     true -> <<"not an atom">>
                 end,
                 if
                     is_pid(PidOrPort) -> [];
                     is_port(PidOrPort) -> [];
                     true -> <<"not a pid or a port">>
                 end],
            case Errors of
                [[],[]] ->
                    [<<"name is in use">>];
                [_,_] ->
                    Errors
            end,
    PotentialErrors = [{1, Arg1}, {2, Arg2}],
    maps:from_list([{ArgNum, Err} ||
                       {ArgNum, Err} <- PotentialErrors,
                       Err =/= []]);
      .
      .
      .

do_format_error(_, _, _) ->
    #{}.

Note that different strategies for different BIFS are used for determining the extended error information:

Supplying extended error information using erlang:error/3

A library or application can raise an error exception with extended error information by calling erlang:error(Reason, Arguments, Options). Reason should be the error reason (for example badarg), Arguments should be arguments for the calling function, and Options should be [{error_info,ErrorInfoMap}].

The caller of erlang:error/3 should provide a format_error/2 function (not necessarily with that name if the ErrorInfoMap has a function key) that behaves as described in the previous section.

Formatting stacktraces

To make it possible for applications and libraries to format stacktraces in the same style as the shell, the functions erl_error:format_exception/3 and erl_error:format_exception/4 are provided. Here is an example how erl_error:format_exception/3 can be used:

try
    .
    .
    .
catch
    C:R:Stk ->
        Message = erl_error:format_exception(C, R, Stk),
        io:format(LogFile, "~ts\n", [Message])
end.

The erl_error:format_exception/4 function is similar but has a fourth option argument to support customizing the message. See the documentation in the reference implementation for details.

Possible future extensions

Since the error_info tuple in the stacktrace contains a map, more data could be added to the map in future extension of this EEP.

Similarly, since the return value of format_error/2 is a map, additional keys in the map could be assigned a meaning in the future.

For example, the value for the key hint could be a longer message that gives more context or provides concrete advice on how to investigate or avoid the error.

Additional examples

Let's look at some examples using ETS:

1> T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5168>
2> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 2: not a key that exists in the table

Note that when an error occurs while evaluating an expression entered in the shell, the evaluator process terminates and any ETS tables created by that process are deleted. Thus, calling update_counter a second time with the same arguments results in a different message:

3> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5168>,k,1)
        *** argument 1: the table identifier does not refer to an existing ETS table

Starting over, creating a new ETS table:

4> f(T), T = ets:new(table, []).
#Ref<0.2290824696.4161404930.5205>
5> ets:insert(T, {k,a,0}).
true
6> ets:update_counter(T, k, 1).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,1)
        *** argument 3: the value in the given position in the object is not an integer
7> ets:update_counter(T, k, bad).
** exception error: bad argument
     in function  ets:update_counter/3
        called as ets:update_counter(#Ref<0.2290824696.4161404930.5205>,k,bad)
        *** argument 1: the table identifier does not refer to an existing ETS table
        *** argument 3: not a valid update operation

Motivation

When a call to a BIF fails with the reason badarg it is not always obvious even to an experienced developer exactly which argument was "bad" and in which way. For a newcomer, having to figure out what a badarg means is another stumbling block standing in the way of mastering a new language.

Even for an experienced developer, figuring out the reason for a badarg exception for some BIFs is hard or impossible. For example, the documentation for ets:update_counter/4 at the time of writing lists 8 situations in which ets:update_counter/4 will fail. That number is too low. Missing from list are, for example, reasons such as the ETS table having been deleted or having insufficient access right.

The general return key was added in order to allow information to be given about the default I/O device in io:format("hello"). It also allows third-party error_report implementations (such as Elixir) much more freedom in what they can return.

The reason return key was added in order for third-party error_report implementations (such as Elixir) to influence what is printed to describe the actual error.

Rationale

Why not change badarg to something more informational?

An alternative way to provide more information about errors would be to introduce additional exception reasons. For example, the call:

element(a, b)

could raise the exception:

{badarg,[{1,not_integer},{2,not_tuple}]}

That change could break code that expects that BIFs should raise a badarg exception. It is less likely that existing code would match the fourth entry in the stacktrace.

A related reason is the amount of work needed to revise the error handling code for all built-in functions. Implementing building of Erlang terms in C is tedious and error prone. There would always be a risk that bugs in that code would crash the runtime system when an error occurred. The test suite would have to be extremely thorough to ensure that all bugs were found, because error handling code is typically infrequently executed.

Why can't the stacktrace contain the complete error reason?

We did consider modifying the implementation of all BIFs so that they would produce complete error information in the stacktrace when they failed. However, as mentioned earlier, building Erlang terms in C is tedious and error prone.

With the approach we have taken to let Erlang code do most of the analysis of the error reason, there is a much lower risk that error handling would crash the application or runtime system.

Why is the map key named cause and not reason?

To avoid confusion with the exception reason.

Why is the value for cause in the ErrorInfoMap undocumented?

The reason in the ErrorInfoMap is not meant to be used for programmatically figuring out why an error occurred, but only to be used by Module:format_error/2 to produce a human-readable message.

Also, for many BIFs the cause key will not be present, as the Module:format/4 function will produce the messages based solely on the name of the BIF and its arguments.

How is the module key useful?

How is the function key useful?

Backwards Compatibility

All exceptions from BIFs will now have a ExtraInfo element (called Location in the documentation for OTP 23) in the call-stack back trace (stacktrace) that includes an error_info tuple. In previous releases the ExtraInfo element would be an empty list for a failed BIF call.

Applications that explicitly do matching on the stacktrace and do assumptions of the layout of the ExtraInfo element (for example, assuming that Location is either an empty list or a list of file and line tuples in a specific order) may need modifications. Note that such assumptions have never been safe and that the documentation for error handling strongly discourages developers to rely on stacktrace entries for purposes other than debugging.

Implementation

The reference implementation includes extended error information for most BIFs implemented in C in the erlang and ets modules. It can be found in PR #2849.

Copyright

This document has been placed in the public domain.