[erlang-questions] Reference counting in NIFs

ANTHONY MOLINARO <>
Thu Aug 15 02:12:06 CEST 2013


Hi,

So I've been using re2 for some time now.  I want to compile large regexes and use them from multiple processes.  This however leads to a bottleneck because I need to have a gen_server which holds onto the NIF resource returned from the call to re2:compile/2 and pass it to calls to re2:match/3.  The re2 engine runs in a few microseconds, but the gen_server message queue can take milliseconds and often under high concurrency it's message queue grows, and it takes even longer.

The idea I had was to add an option to re2:compile/2 called {named_pattern, atom()} which when given an atom() keeps a copy of the re2 object on the C++ side (in a map<> at the moment), and then allows you to pass the atom() as the regex argument to re2:match/3.  This works fine except in the case where you want to recompile the regex for the name (this happens every so often).

Sometimes the recompile works and other times it seg faults.  I'm pretty sure the segfaults occur during the switching, since a call like

old_handle = named_patterns[copts.name];
named_patterns[copts.name] = new_handle;
enif_release_resource (old_handle);

occurring in one thread can conflict with another thread calling

handle = named_patterns[copts.name];
…
// use re2 from within handle

since the handle could be released (and I assume freed).

So, I've been trying various things, but ideally, I'd just use the resource reference counting and the GC to make sure I don't leak memory.  However, it's a little clear when the references are incremented and decremented from the example in the documentation.

The documentation seems to suggest that calling enif_make_resource will add to the reference count, and that enif_release_resource will decrement from the reference count.  Also it appears as though enif_keep_resource will also increment, but I'm not sure if enif_alloc_resource also increments (the documentation doesn't mention it).

I tried a scheme were I call enif_keep_resource at the beginning of the re2:match/3 call, and enif_release_resource at the end to attempt to keep around the resource, but I still see some segfaults.

So I guess my questions are.

1. which enif_* functions change the reference count, and how?
2. is there a better/safer way?

I'd like to get this working as basho_bench shows me the ops/sec increasing from 12000 / second to 17000 / second and the mean time decreasing from 0.08 ms to 0.02 ms (the 99.9th increases however, so there must be more outlier's with my scheme for some reason).

Sorry for the long email,

-Anthony




More information about the erlang-questions mailing list