[erlang-bugs] NIF .so reload issues

David Buckley isreal-erlang-bugs-at-erlang.org@REDACTED
Fri Dec 18 18:07:33 CET 2015


On Fri, Dec 18, 2015 at 05:41:24PM +0100, Sverker Eriksson wrote:
> Hi David,
> 
> Yes, this is a dlopen restriction and also an ambiguity as I've heard
> different behaviour reported depending on OS.
> 
> My Linux man page for dlopen says "If the same library is loaded again with
> dlopen(),
> the same file handle is returned". But it does not specify what "the same"
> actually means.
> 
> The Erlang VM has to keep the old .so file loaded until the module is safely
> purged [*]
> as there may exist Erlang processes still lingering in the old code. Trying
> to execute
> unloaded native code does not behave well.
> 
> When you call load_nif with the same library name (as the
> not yet purged one), dlopen thinks it's "the same" library
> and just returns the same handle again.
> 
> What to do?
> 
> Rename the .so library, give it a version number. Or maybe
> put it in a different directory will work (?).
> 
> Add something about this problem to the erl_nif docs. Yes that would be
> nice.
> 
> I'm hesitant to recommend purging in on_load. The on_load feature
> is still experimental and we have some known problems with bad
> behaviour, especially in the error cases when on_load fails.
> To fix that we may have to limit what you are allowed
> to do in on_load and code purging might be such a limitation.
> 
> 
> [*] Purging may actually not be enough. If the NIF library has created
> resource objects with a destructor callcack, it will not be unloaded until
> the last resource objects has been garbage collected.

Hmmm, I was going to create resources!

I guess for development I'll add a hack that just creates a link to the
file with a temporary name before loading it, so that a new handle to it
is created each time. There /is/ a secret RTLD_PRIVATE flag for dlopen
-- that is not apparently supported on any OS mentioned on the first
page of google -- to get a private instance.

For production, versioning the library code ought to be fine. Most
system libraries already contain version numbers in the filename, and I
suppose this is part of why. It's only reloading for rapid development
that is causing pain here!

Is the old dlopen bound to the old (Erlang) code? That is, if I
instigate this hack, and leak resources somehow while reloading often,
will I have problems reloading the module, cause processes to be
violently uprooted as with purge, or simply leak dlopen handles until I
clean up?

Is there any chance of purge/soft_purge being extended to cover nif
resources?

-- 
David Buckley



More information about the erlang-bugs mailing list