[erlang-bugs] Fwd: Race condition in cover.erl

Siri Hansen erlangsiri@REDACTED
Mon Apr 7 11:28:14 CEST 2014


Hi Andrew,

we would really appreciate a fix for this issue. We think that cover should
at least work across maintenance releases within the same major release,
but we would also prefer a clean solution without the extra complexity that
would be necessary for keeping the backwards compatibility. This means that
we can only include this in the next major release, i.e. Erlang/OTP 18. I
hope this is ok with you!

I can't think of any good reason for not using gen_servers in cover, so
please feel free to refactor the code using these behaviours.

Best regards
/siri



2014-04-04 19:03 GMT+02:00 Andreas Schumacher <andreas@REDACTED>:

> Thank you for the report and your initial investigation. We appreciate
> your offer to fix the issue and will come back with answers to your
> questions next week.
>
> Andreas Schumacher, Erlang/OTP, Ericsson AB
>
>
> -----Original Message-----
> From: erlang-bugs-bounces@REDACTED [mailto:
> erlang-bugs-bounces@REDACTED] On Behalf Of Andrew Thompson
> Sent: den 3 april 2014 02:01
> To: erlang-bugs@REDACTED
> Subject: [erlang-bugs] Race condition in cover.erl
>
> I've been doing some pretty extreme coverage reporting lately in an effort
> to help understand the coverage provided by Basho's integration test suite
> riak_test.
>
> As part of that work I've seen, several times now, an error like this:
>
> 2014-04-02 15:47:23 =ERROR REPORT====
> Error in process <0.80.0> on node 'riak_test@REDACTED' with exit value:
>
> {function_clause,[{cover,'-sync_compiled/2-lc$^0/1-0-',[ok],[{file,"cover.erl"},{line,1077}]},{cover,sync_compiled,2,[{file,"cover.erl"},{line,1077}]},{cover,main_process_loop,1,[{file,"cover.erl"},{line,819}]}]}
>
> However, this is *extremely* hard to reproduce with my use case, taking
> upwards of 15 hours, and it only happens on slower machines.
>
> I've added some debug prints, and the result of
> remote_call(Node,{remote,get_compiled}) is coming back as 'ok'.
>
> Looking at the code for that, we can see that is clearly impossible:
>
> https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L893
>
> #remote_state.compiled is always a list, so where is the 'ok' coming from?
>
> At first I thought the async reply from collect,remote was the source of
> the errant 'ok', but re-reading that code, it is using the 'from'
> syntax, so the collect,remote replies are going to a particular pid, not
> the registered cover ?SERVER.
>
> The problem remains, however, that cover.erl plays fast and loose with the
> mailbox, requests and replies are not tagged with a ref (like in
> gen_server) so it is possible for the receive in remote_call to get a
> reply for a request it did not make:
>
> https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L570
>
> I am pretty sure that is what is happening here, although I cannot spot
> the exact cause. Mismatched requests/replies could happen quiet frequently
> in this module, given that most of the commands simply return 'ok' anyway.
>
> I'm happy to put some more time into debugging and fixing this, but I need
> some more information on what I can and can't do.
>
> 1 - Can I change the messaging protocol in a backwards compatible way?
>     Is running coverage across multiple nodes at once expected to work
>     across OTP versions? Can I change the protocol if I keep things
>     backwards compatible (by enumerating something on the spawned
>     remotes to see if they can use the new protocol)?
>
> 2 - Why is this code not a gen_server? Is there some reason or is it
>     just because of the age of the code? Would it be permissible to
>     refactor cover.erl into 2 gen_servers (main and remote cover
>     servers)?
>
> Andrew
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
> _______________________________________________
> erlang-bugs mailing list
> erlang-bugs@REDACTED
> http://erlang.org/mailman/listinfo/erlang-bugs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140407/6dc49efa/attachment.htm>


More information about the erlang-bugs mailing list