[erlang-questions] JInterface publishPort() issue
Thu Jul 31 03:59:53 CEST 2008
(Erlang and erlang-questions newbie here!)
I just spent 3 days tracking down a bug in my code, that was related to
JInterface. I thought I would let other people learn from my error.
The symptom was that a registered OtpNode stopped responding to messages at
some point. Lots of tracking down (e.g. by running epmd with debugging
turned on) finally revealed that the name was being unregistered due to the
connection closing on the JInterface side. Further thought and
investigation showed that it was occurring because the garbage collector was
reaping the Socket created in OtpNode.r4_publish().
The problem turned out to be that after creating an OtpNode, I was
explicitly calling publishPort(node). But OtpNodes do a publishPort() as
part of initialization.
Because the port was already published, the 2nd call to r4_publish() was
returning null, and the null was being assigned into node.epmd, clobbering
the previously-stored Socket. Once GC got around to noticing that no one
was referring to the Socket, it closed the connection.
So lesson 1: don't call OtpEpmd.publishPort() on an OtpNode().
Lesson 2: Check the return value of functions (had I realized that
publishPort() was returning false on the 2nd call, I could have tracked down
the bug immediately).
Lesson 3: OtpEpmd. publishPort() should probably check to see if node.epmd
is already set, and if so, just return (maybe with a warning), rather than
clobbering the existing Socket.
Lesson 4: Open source rocks. This bug would have been impossible to track
down without being to read and modify the source to JInterface and epmd.
View this message in context: http://www.nabble.com/JInterface-publishPort%28%29-issue-tp18746378p18746378.html
Sent from the Erlang Questions mailing list archive at Nabble.com.
More information about the erlang-questions