[erlang-bugs] common_test + test_server_io errors

Wed Aug 14 11:58:06 CEST 2013

Hi Peter,

Thanks for getting back to me with this. Now I've had chance to look at it,
and to experiment with some different options, I can explain a bit more
about what I think is going on.

On 6 August 2013 16:47, Peter Andersson <peppe@REDACTED> wrote:

> Printouts from CT hook functions are "safe" in the sense that they
> execute sequentially pre/post test suite/group/case execution. It's not
> possible that a call to ct:log/2 from a hook function "comes in too soon
> or late", i.e. gets handled before/after Common Test has
> started/finished executing.
>
>
<snip>

> In general, if one knows that Common Test is running (which is the time
> between the CT hook init and terminate call, or the start_logging and
> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
> from anywhere and find the data in either the test case logs or in the
> unexpected i/o log (depending on the group leader setting).
>
>
<snip>

> The reason for the exit you reported initially, is that if a log call
> happens during startup or shutdown of Common Test, then, during a short
> window, it's possible that Common Test fails to communicate with Test
> Server and crashes.
>
>
So I've definitely been hitting a race condition in my code here. I tried
adding/removing the event hander that routes logging messages to
ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't
help at all. In my ct_hook module, the call that triggers this explosion
occurs inside the init/1 function:

%% from systest_cth.erl

init(systest, Opts) ->
    case application:start(systest, permanent) of
        {error, {already_started, systest}} -> systest:reset();
        {error, _Reason}=Err                -> Err;
        ok                                  -> ok
    end,
    etc ....

%% from systest.erl

reset() ->
    %% both these operations are synchronous
    systest_watchdog:reset(),
    systest_results:reset(),
    ok.

%% systest_watchdog.erl

reset() ->
    gen_server:call(?MODULE, reset).

Further down in systest_watchdog, the handle_call/3 clause that deals with
the 'reset' signal ends up calling the systest_log, which leads to a
gen_event handler calling ct:pal/2.

According to what you've said above, about the timing of logging attempts
within the bounds of a ct_hook's init/terminate functions, that seems like
it should work shouldn't it? Attempting to trigger the logging by only
(de)registering the ct:pal handler in response to the {start,stop}_logging
events didn't though. I guess there must be another potential race there -
however I haven't attempted to do that via the hook's init/terminate
functions - I'll try that now and let you know if it resolves my issues.

Before I try to answer your question below, I need to understand better
> what you want to happen to the log printouts that take place during your
> configuration/setup phase (before the test run starts) and/or during the
> teardown phase (when Common Test has shut down).

I think/hope that all my setup/teardown is synchronous with regards each
test run - that is to say I allow for parallel test cases within suites and
I don't actually return from the hook's systest_cth:stop/4 callback until
all the resources configured at that scope (be it suite, group, testcase)
have been killed and reported (via monitors) as dead.

If Common Test - in an
> offline mode (i.e. not running) - should attempt to write incoming
> ct:log/2 strings to a file, the best it can do really, is to
> write/append them to say a circular log file in the current working
> directory. This is possible, but it will be difficult to know which
> printouts belong to which test runs when analyzing the logs, and as far
> as I understand, this is the sort of thing you're trying to avoid anyway.
>
>
Indeed. As a temporary work-around, I've stopped logging internal events to
the common_test logs and put them in a separate log file instead, but
that's not really what I wanted to end up with.

> As far as I see, it's quite possible to do something clever with log
> printouts that happen *before* Common Test starts. They could be
> buffered in a temporary file then read and resent by a CT hook init
> function so that this data ends up first in the unexpected i/o log for
> the test run. The problem here is what to do with printouts that happen
> *after* Common Test has stopped but before your teardown is finished.
> Another possibility could maybe be, if possible, to change the order of
> the whole session so that Common Test is always started before your
> configuration/setup (CT hooks can for example be added dynamically) and
> not stopped until teardown is also finished. Perhaps an init and
> terminate function in a high prio CT hook module could be used to
> synchronize this. Sounds feasible to me.
>
> Let me know if I understand your problem correctly and tell me what
> ideas/requests you have and let's move on from there.
>
>
I'm going to try again with a high prio hook to turn on/off the logging
handler and see if this works. By all accounts it sounds as if it should.
The other thing I might do to alleviate this problem is swallow any
exception from ct:pal/2, which is *cough* bad form as a rule, but in this
case might actually be the right thing to do.

Let me experiment with those two options first and get back to you. It
might be that all I end up asking for is a bit more info in the
documentation explaining the constraints.

I'll try to post back later today.

Cheers,
Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/56b96a47/attachment.htm>