[erlang-bugs] Allow code upgrade for the code_server itself

Siri Hansen erlangsiri@REDACTED
Wed Jul 2 11:54:36 CEST 2014


Stavros,

you are right! Of course the change_code system message does not help. The
only thing it does is to call system_code_change/4 in the new version of
the module, but when it returns, the process keeps executing the old code.

Your suggested change, to make the system_continue call on line 184
qualified, will solve the problem. It will, however, also make (almost) any
system message effectuate a code change (if new code_server code is
loaded), e.g.

Erlang/OTP 18 [DEVELOPMENT] [erts-7.0] [source-a6de62b] [smp:4:4]
[async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.0  (abort with ^G)
1> l(code_server).
{module,code_server}
2> sys:get_status(code_server).
{status,<0.20.0>,
        {module,code_server},
        [[{any_native_code_loaded,false}],
         running,<0.11.0>,[],
         {state,<0.11.0>,"/ldisk/siri/git/otp",
                [".","/ldisk/siri/git/otp/lib/kernel/ebin",
                 "/ldisk/siri/git/otp/lib/stdlib/ebin",
                 [...]|...],
                13,4110,no_cache,interactive,[]}]}
3> erlang:check_process_code(whereis(code_server),code_server).
false

I don't know if this is a good thing... On the other hand, that is the same
as would happen for e.g. gen_server or any of the other gen* behaviour so
maybe it is ok.

Another potential problem is that if you do the suspend - change_code -
resume sequence, then system_code_change would be called in the new code,
but then the old code would run again after its return and until the resume
message was handled. Depending on which part of the code that was actually
changed, this could be fatal. In the gen* behaviour case this is not a
problem since the code running in between is in sys and not in gen*. A bit
more code, and exporting the suspend_loop function, could of course solve
this problem...

I think I need to discuss this a bit further with my team - if it is
something we would want to do or not. Please let me know if such a change
is at all desirable. Comments from others on the list are of course also
welcome.

Regards
/siri


2014-06-27 14:11 GMT+02:00 Stavros Aronis <aronisstav@REDACTED>:
>
> Hi Siri!
>
> You are right. I had a modification on the actual code_server.erl, which
I hadn't removed when generating the previous report. With a clean R17 I
cannot reproduce it so apologies for the false alarm.
>
> Back to the main topic, this sequence does not seem to work either, as
the code_server is still stuck in old code:
>
> ~$ diff ~/git/otp/lib/kernel/src/code_server.erl ~/code_server.erl
> 1263a1264
> >                     erlang:display(foo),
> ~$ erlc code_server.erl
> ~$ erl -nostick
> Erlang/OTP 17 [erts-6.0] [source-07b8f44] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]
>
> Eshell V6.0  (abort with ^G)
> 1> l(code_server).
> {module,code_server}
> 2> code:which(code_server).
> "/home/stavros/code_server.beam"
> 3> erlang:check_old_code(code_server).
> true
> 4> code:soft_purge(code_server).
> false
> 5> erlang:check_process_code(whereis(code_server),code_server).
> true
> 6> sys:suspend(code_server).
> ok
> 7> sys:change_code(code_server, code_server, ignored, ignored).
> ok
> 8> sys:resume(code_server).
> ok
> 9> erlang:check_process_code(whereis(code_server),code_server).
> true
>
> Regards,
>
> Stavros
>
>
> On Fri, Jun 27, 2014 at 1:31 PM, Siri Hansen <erlangsiri@REDACTED> wrote:
>>
>> Hi Stavros!
>>
>> 2014-06-23 11:48 GMT+02:00 Stavros Aronis <aronisstav@REDACTED>:
>>
>>> a lot of time has passed indeed, and I have put in place a different
kind of instrumentation, which works for what I want to do. Unfortunately I
currently don't have the time to test again, but I suspect that the issue
remains, for the reasons I explained.
>>
>>
>> Yes, I assume nothing has changed - my point was that when receiving a
change_code system message, code_server does a qualified call to
system_code_change. This is where the actual code change is expected to
happen - not in the call to system_continue at line 184.
>>
>> I would think that the reason that sys uses a qualified call is that the
module in question is not ?MODULE, but rather a callback.
>>
>>>
>>>
>>> Here is another bug, admittedly unrelated (since I am not trying to
update the code of the code_server *process*):
>>>
>>> 1) Copied code_server.erl to my home directory and added an
"erlang:display(foo)" call before the single erlang:load_module/1 call in
the module.
>>> 2) Run the following:
>>> $ erlc code_server.erl
>>> $ erl -nostick
>>> Erlang/OTP 17 [erts-6.0] [source-07b8f44] [64-bit] [smp:8:8]
[async-threads:10] [hipe] [kernel-poll:false]
>>>
>>> Eshell V6.0  (abort with ^G)
>>> 1> l(code_server).
>>> {module,code_server}
>>> 2> code:which(code_server).
>>> "/home/stavros/code_server.beam"
>>> 3> erlang:check_old_code(code_server).
>>> true
>>> 4> code:soft_purge(code_server).
>>> true
>>> 5> l(te *TAB*
>>> Crash dump was written to: erl_crash.dump
>>> Internal error: Invalid reference count found on
#Fun<code_server.0.416>:  About to erase fun still referred by code.
>>> Aborted
>>>
>>> I would expect the soft_purge to fail and the system not to crash.
>>
>>
>> Yes, I would also expect that. And I can not reproduce the problem :(
>>
>> $ erlc code_server.erl
>> $ erl -nostick
>> Erlang/OTP 17 [erts-6.0] [source-07b8f44] [smp:4:4] [async-threads:10]
[hipe] [kernel-poll:false]
>>
>> Eshell V6.0  (abort with ^G)
>> 1> l(code_server).
>> {module,code_server}
>> 2> code:which(code_server).
>> "/home/siri/code_server.beam"
>> 3> erlang:check_old_code(code_server).
>> true
>> 4> code:soft_purge(code_server).
>> false
>>
>> I also have
>> 5> erlang:check_process_code(whereis(code_server),code_server).
>> true
>>
>> Do you have any other patches, or is it a plain 17.0? (Sorry - I don't
have any other ideas right now :( )
>>
>> /siri
>>
>>>
>>>
>>> On Thu, Jun 19, 2014 at 11:26 AM, Siri Hansen <erlangsiri@REDACTED>
wrote:
>>>>
>>>> Hi Stavros, I'm sorry for the very long delay! Are you still
struggling with this problem or did you find a way around it? Would a
code_change system message while the process is suspended possibly solve
the problem?
>>>> Regards
>>>> /siri
>>>>
>>>>
>>>> 2014-03-10 23:02 GMT+01:00 Stavros Aronis <aronisstav@REDACTED>:
>>>>>
>>>>> Hello!
>>>>>
>>>>> I am playing around with code instrumentation and trying to hack the
code server so that it applies some transformation whenever it loads *any*
module. The hack itself is relatively simple:
>>>>>
>>>>> 1) Instrument and reload any already loaded modules (come to think
about it, during this process more modules may be loaded, but let's assume
a fixpoint). This is to avoid the case where in order to load A, you have
to instrument A, and the instrumenter itself needs a call to X, which is
not yet loaded so you have to load X, so you have to instrument X, etc...
>>>>> 2) Get the Core Erlang code of the codeserver and wrap the second
argument of erlang:load_module (Line 1264) with a call to my instrumenter
(which is a function from binary() -> binary())
>>>>> 3) Load the patched code_server code
>>>>> 4) Move the code_server process from the old code to the new one.
>>>>>
>>>>> I am having trouble with the last step. As far as I understand it,
the reason is that the call to system_continue (Line 184) is not qualified,
as is the similar call in sys.erl (Line 324).
>>>>>
>>>>> Is there a reason why this is so? Is there any possibility for this
to be patched?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Stavros
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> erlang-bugs mailing list
>>>>> erlang-bugs@REDACTED
>>>>> http://erlang.org/mailman/listinfo/erlang-bugs
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-bugs/attachments/20140702/0321dead/attachment.htm>


More information about the erlang-bugs mailing list