<div dir="ltr">Hi Andrew,<div><br></div><div>we would really appreciate a fix for this issue. We think that cover should at least work across maintenance releases within the same major release, but we would also prefer a clean solution without the extra complexity that would be necessary for keeping the backwards compatibility. This means that we can only include this in the next major release, i.e. Erlang/OTP 18. I hope this is ok with you!</div>
<div><br></div><div>I can't think of any good reason for not using gen_servers in cover, so please feel free to refactor the code using these behaviours.</div><div><br></div><div>Best regards</div><div>/siri</div><div>
<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-04-04 19:03 GMT+02:00 Andreas Schumacher <span dir="ltr"><<a href="mailto:andreas@erlang.org" target="_blank">andreas@erlang.org</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thank you for the report and your initial investigation. We appreciate your offer to fix the issue and will come back with answers to your questions next week.<div>
<br></div><div>Andreas Schumacher, Erlang/OTP, Ericsson AB<div><div class="h5"><br>
<div class="gmail_quote"><br>
-----Original Message-----<br>
From: <a href="mailto:erlang-bugs-bounces@erlang.org" target="_blank">erlang-bugs-bounces@erlang.org</a> [mailto:<a href="mailto:erlang-bugs-bounces@erlang.org" target="_blank">erlang-bugs-bounces@erlang.org</a>] On Behalf Of Andrew Thompson<br>
Sent: den 3 april 2014 02:01<br>
To: <a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a><br>
Subject: [erlang-bugs] Race condition in cover.erl<br>
<br>
I've been doing some pretty extreme coverage reporting lately in an effort to help understand the coverage provided by Basho's integration test suite riak_test.<br>
<br>
As part of that work I've seen, several times now, an error like this:<br>
<br>
2014-04-02 15:47:23 =ERROR REPORT====<br>
Error in process <0.80.0> on node '<a href="mailto:riak_test@127.0.0.1" target="_blank">riak_test@127.0.0.1</a>' with exit value:<br>
{function_clause,[{cover,'-sync_compiled/2-lc$^0/1-0-',[ok],[{file,"cover.erl"},{line,1077}]},{cover,sync_compiled,2,[{file,"cover.erl"},{line,1077}]},{cover,main_process_loop,1,[{file,"cover.erl"},{line,819}]}]}<br>
<br>
However, this is *extremely* hard to reproduce with my use case, taking upwards of 15 hours, and it only happens on slower machines.<br>
<br>
I've added some debug prints, and the result of<br>
remote_call(Node,{remote,get_compiled}) is coming back as 'ok'.<br>
<br>
Looking at the code for that, we can see that is clearly impossible:<br>
<br>
<a href="https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L893" target="_blank">https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L893</a><br>
<br>
#remote_state.compiled is always a list, so where is the 'ok' coming from?<br>
<br>
At first I thought the async reply from collect,remote was the source of the errant 'ok', but re-reading that code, it is using the 'from'<br>
syntax, so the collect,remote replies are going to a particular pid, not the registered cover ?SERVER.<br>
<br>
The problem remains, however, that cover.erl plays fast and loose with the mailbox, requests and replies are not tagged with a ref (like in<br>
gen_server) so it is possible for the receive in remote_call to get a reply for a request it did not make:<br>
<br>
<a href="https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L570" target="_blank">https://github.com/erlang/otp/blob/maint/lib/tools/src/cover.erl#L570</a><br>
<br>
I am pretty sure that is what is happening here, although I cannot spot the exact cause. Mismatched requests/replies could happen quiet frequently in this module, given that most of the commands simply return 'ok' anyway.<br>
<br>
I'm happy to put some more time into debugging and fixing this, but I need some more information on what I can and can't do.<br>
<br>
1 - Can I change the messaging protocol in a backwards compatible way?<br>
Is running coverage across multiple nodes at once expected to work<br>
across OTP versions? Can I change the protocol if I keep things<br>
backwards compatible (by enumerating something on the spawned<br>
remotes to see if they can use the new protocol)?<br>
<br>
2 - Why is this code not a gen_server? Is there some reason or is it<br>
just because of the age of the code? Would it be permissible to<br>
refactor cover.erl into 2 gen_servers (main and remote cover<br>
servers)?<br>
<br>
Andrew<br>
_______________________________________________<br>
erlang-bugs mailing list<br>
<a href="mailto:erlang-bugs@erlang.org" target="_blank">erlang-bugs@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
</div><br></div></div></div></div>
<br>_______________________________________________<br>
erlang-bugs mailing list<br>
<a href="mailto:erlang-bugs@erlang.org">erlang-bugs@erlang.org</a><br>
<a href="http://erlang.org/mailman/listinfo/erlang-bugs" target="_blank">http://erlang.org/mailman/listinfo/erlang-bugs</a><br>
<br></blockquote></div><br></div>