[erlang-questions] Trouble with JInterface
Tue Jan 13 15:39:03 CET 2015
We are having a problem with Jinterface and Erlang 17.2. Essentially we
have a Java/Clojure application that is using a home-brewed, Erlang,
two-node, in-memory cache. We think the problem is either a bug with
Jinterface or a misunderstanding on how to use it.
On the Java side, we create a node and 1 to 12 unnamed mailboxes. We have a
receive loop on each mailbox: receiving messages from Erlang, doing the
right things with those messages, catching any errors and looping back to
the receive. In general, everything works fine.
When one of the Erlang cache-nodes goes down, the application recreates its
mailboxes, connects to the “other” Erlang node via the new mailboxes and
attempts to carry on. Here is where we have the problem:
The Failover Protocol:
For each mailbox
1. Receive loop gets OtpErlangExit
2. Existing mailbox is closed
3. New unnamed mailbox created
4. Application appropriate message is sent to the failover node via this
5. Loop to the receive
If the receive loop has a timeout, no message is received by the new
mailbox until AFTER the timeout fires and the loop iterates back to the
receive. If there is no timeout specified the receive hangs (seemingly)
forever. We have confirmed that the failover Erlang node received the
message (step 4) sent via the new mailbox, and that the failover Erlang
node is sending messages to the PID representing the new mailbox —
inspecting my server using Observer, I can see that it is sending the
expected message to the correct PID.
Our workaround is to use a short timeout in our mailbox receive loop so
that we don’t hang for too long, but we are troubled by the thought that we
might not understand JInterface best-practices as we are skeptical that the
JInterface code is anything but perfect :)
Is this a known issue, or does someone have any advice on how to proceed?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the erlang-questions