[erlang-bugs] common_test + test_server_io errors

Wed Aug 14 13:09:41 CEST 2013

Hi Peter,

"In general, if one knows that Common Test is running (which is the time
between the CT hook init and terminate call, or the start_logging and
stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
from anywhere and find the data in either the test case logs or in the
unexpected i/o log (depending on the group leader setting)."

After some initial testing, I can confirm that this does not work as
expected. I changed my code to only add the event (log) handler which
executes ct:pal/2 inside the hook's init/2 callback, however the problem
still persists! My hook and the corresponding call chain now looks like
this:

%% systest_cth hook

init(systest, Opts) ->
    case application:start(systest, permanent) of
        {error, {already_started, systest}} -> io:format("starting ct
log!~n"),
                                               systest_ct_log:start(),
                                               systest:reset();
        {error, _Reason}=Err                -> Err;
        ok                                  -> ok
    end,
    etc ....

%% systest_ct_log:start

start() ->
    ok = systest_log:start(ct, systest_ct_log, common_test).

%% systest_log:start/3

start(Id, Mod, Output) ->
    gen_event:add_handler(systest_event_log, {?MODULE, Id}, [Id, Mod,
Output]).

%% systest_ct_log:write_log/4

write_log(EvId, _Fd, What, Args) ->
    ct:log("[" ++ as_string(EvId) ++ "] " ++ as_string(What), Args).

When I execute a test run with this code in place however, I still get the
crash, though the io:format/2 notice that I'm starting the ct log appears
first:

Common Test starting (cwd is
/home/t4/work/vmware/rabbitmq-public-umbrella/rabbitmq-test/multi-node)

starting ct log!

ct_util_server got EXIT from <0.61.0>: {noproc,
                                        {gen_server,call,
                                         [test_server_io,
                                          {print,xxxFrom,unexpected_io,
                                           [[[["<div
class=\"default\"><b>*** User 2013-08-14 12:02:36.830 ***</b>"],
                                              "\n",

[91,102,114,97,109,101,119,111,
                                               114,107,93,32,119,97,116,99,

104,100,111,103,58,32,110,111,
                                               32,112,114,111,99,115,32,116,

111,32,107,105,108,108,"\n"]],
                                             "\n","</div>"]]},
                                          infinity]}}

So it appears that the assertion that logging will work between the hook's
init and terminate callbacks isn't quite working.

On 14 August 2013 10:58, Tim Watson <watson.timothy@REDACTED> wrote:

> Hi Peter,
>
>
> Thanks for getting back to me with this. Now I've had chance to look at
> it, and to experiment with some different options, I can explain a bit more
> about what I think is going on.
>
> On 6 August 2013 16:47, Peter Andersson <peppe@REDACTED> wrote:
>
>> Printouts from CT hook functions are "safe" in the sense that they
>> execute sequentially pre/post test suite/group/case execution. It's not
>> possible that a call to ct:log/2 from a hook function "comes in too soon
>> or late", i.e. gets handled before/after Common Test has
>> started/finished executing.
>>
>>
> <snip>
>
>
>> In general, if one knows that Common Test is running (which is the time
>> between the CT hook init and terminate call, or the start_logging and
>> stop_logging event message), it is safe to call ct:log/2 or ct:pal/2
>> from anywhere and find the data in either the test case logs or in the
>> unexpected i/o log (depending on the group leader setting).
>>
>>
> <snip>
>
>
>> The reason for the exit you reported initially, is that if a log call
>> happens during startup or shutdown of Common Test, then, during a short
>> window, it's possible that Common Test fails to communicate with Test
>> Server and crashes.
>>
>>
> So I've definitely been hitting a race condition in my code here. I tried
> adding/removing the event hander that routes logging messages to
> ct:{log,pal}/2 around the {start,stop}_logging events, however that didn't
> help at all. In my ct_hook module, the call that triggers this explosion
> occurs inside the init/1 function:
>
> %% from systest_cth.erl
>
> init(systest, Opts) ->
>     case application:start(systest, permanent) of
>         {error, {already_started, systest}} -> systest:reset();
>         {error, _Reason}=Err                -> Err;
>         ok                                  -> ok
>     end,
>     etc ....
>
> %% from systest.erl
>
> reset() ->
>     %% both these operations are synchronous
>     systest_watchdog:reset(),
>     systest_results:reset(),
>     ok.
>
> %% systest_watchdog.erl
>
> reset() ->
>     gen_server:call(?MODULE, reset).
>
>
> Further down in systest_watchdog, the handle_call/3 clause that deals with
> the 'reset' signal ends up calling the systest_log, which leads to a
> gen_event handler calling ct:pal/2.
>
> According to what you've said above, about the timing of logging attempts
> within the bounds of a ct_hook's init/terminate functions, that seems like
> it should work shouldn't it? Attempting to trigger the logging by only
> (de)registering the ct:pal handler in response to the {start,stop}_logging
> events didn't though. I guess there must be another potential race there -
> however I haven't attempted to do that via the hook's init/terminate
> functions - I'll try that now and let you know if it resolves my issues.
>
> Before I try to answer your question below, I need to understand better
>> what you want to happen to the log printouts that take place during your
>> configuration/setup phase (before the test run starts) and/or during the
>> teardown phase (when Common Test has shut down).
>
>
> I think/hope that all my setup/teardown is synchronous with regards each
> test run - that is to say I allow for parallel test cases within suites and
> I don't actually return from the hook's systest_cth:stop/4 callback until
> all the resources configured at that scope (be it suite, group, testcase)
> have been killed and reported (via monitors) as dead.
>
> If Common Test - in an
>> offline mode (i.e. not running) - should attempt to write incoming
>> ct:log/2 strings to a file, the best it can do really, is to
>> write/append them to say a circular log file in the current working
>> directory. This is possible, but it will be difficult to know which
>> printouts belong to which test runs when analyzing the logs, and as far
>> as I understand, this is the sort of thing you're trying to avoid anyway.
>>
>>
> Indeed. As a temporary work-around, I've stopped logging internal events
> to the common_test logs and put them in a separate log file instead, but
> that's not really what I wanted to end up with.
>
>
>> As far as I see, it's quite possible to do something clever with log
>> printouts that happen *before* Common Test starts. They could be
>> buffered in a temporary file then read and resent by a CT hook init
>> function so that this data ends up first in the unexpected i/o log for
>> the test run. The problem here is what to do with printouts that happen
>> *after* Common Test has stopped but before your teardown is finished.
>> Another possibility could maybe be, if possible, to change the order of
>> the whole session so that Common Test is always started before your
>> configuration/setup (CT hooks can for example be added dynamically) and
>> not stopped until teardown is also finished. Perhaps an init and
>> terminate function in a high prio CT hook module could be used to
>> synchronize this. Sounds feasible to me.
>>
>> Let me know if I understand your problem correctly and tell me what
>> ideas/requests you have and let's move on from there.
>>
>>
> I'm going to try again with a high prio hook to turn on/off the logging
> handler and see if this works. By all accounts it sounds as if it should.
> The other thing I might do to alleviate this problem is swallow any
> exception from ct:pal/2, which is *cough* bad form as a rule, but in this
> case might actually be the right thing to do.
>
> Let me experiment with those two options first and get back to you. It
> might be that all I end up asking for is a bit more info in the
> documentation explaining the constraints.
>
> I'll try to post back later today.
>
> Cheers,
> Tim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20130814/68a24a37/attachment.htm>