[erlang-questions] prim_inet:ignorefd/2 affecting erts_poll_wait()

Mon Aug 25 06:25:03 CEST 2014

On 08/24/2014 09:10 AM, Michael Santos wrote:
> On Fri, Aug 22, 2014 at 06:01:56PM -0700, Michael Truog wrote:
>> To be a little clearer, my prim_inet:ignorefd/2 usage is to use a
>> file descriptor created in the inet source code after passing it
>> through gen_tcp:fdopen/2. So, the sequence is:
>> ...
>> {ok, FileDescriptorInternal} = prim_inet:getfd(Socket),
>> ok = prim_inet:ignorefd(Socket, true),
>> {ok, NewSocket} = gen_tcp:fdopen(FileDescriptorInternal, SocketOptions),
>> ...
>> While this might seem odd, this allows me to dup2 the file
>> descriptor without causing obvious problems and handle UNIX domain
>> sockets, which are currently unsupported within inet (by avoiding
>> the internal checking that prevents their use).
> Well, if you bypass the internal checks ... :)
>
> This doesn't answer your question but I would suggest only using PF_INET
> and PF_INET6 file descriptors with inet. The inet driver assumes it is
> working with sockets described by a sockaddr_in structure. Using anything
> else can lead to undefined behaviour.
>
> For example, something trivial like calling getpeername(2) on a PF_UNIX
> socket would cast the sockaddr_un as a sockaddr_in and presumably return
> 4 bytes of the Unix socket path as an IPv4 address.
I understand.  I do have the same source code working with a normal INET tcp connection.  The dup2 trick is done separately to support the same code path with unix domain sockets in a way that avoids making the inet source code use any of the socket data structures (so dup2 happens last, before active/once turns on the flow control.  So, I know I am doing something a little dirty.  My suspicion is that having the old socket Erlang port along with a new fdopen socket port somehow causes the erts event loop to not timeout due to an fd error that is ignored or an event that doesn't get consumed, I just am not sure how.
>
> The alternative is passing the file descriptor into a port:
>
>      FD = 7,
>      Port = erlang:open_port({fd, FD, FD}, [stream,binary]).
>
> The disadvantage of using this method is the lack of flow control. If that
> is a concern, a process could monitor the rate of a incoming messages
> and close the port if it crossed some threshold, re-opening it when the
> messages have been processed. Closing the port will not close the file
> descriptor so the sender will eventually block.
Yeah, I am still trying to use the inet Erlang/OTP flow control.  It is unfortunate that this is not an easy task with unix domain sockets.
>
> Otherwise, it's simple to write a small driver that uses erts to poll
> the fd on your behalf. I wrote one to use with a unix socket library so
> I could retrieve the socket ancillary data:
>
> https://github.com/msantos/inert
Thank you for mentioning the inert project.  I wasn't aware of it, though I have seen procket, which is cool.  If I have to go down that path, I will, but I still wanted to try to utilize the inet flow control, if it is possible to get something maintainable.

Thanks,
Michael
>
> This is very likely old news to you and you have good reasons for using
> gen_tcp but maybe it will be helpful for other people going down this
> path.
>
>> On 08/22/2014 05:46 PM, Michael Truog wrote:
>>> I have been seeing the erts_poll_wait() thread consume 100% CPU
>>> when my configuration makes prim_inet:ignorefd/2 ignore a fd
>>> (inet_descriptor has is_ignored set to true) created external to
>>> inet (10 file descriptors created this way).  I don't have this
>>> problem when using the inet code to create tcp sockets, when
>>> prim_inet:ignorefd/2 is not used with the same configuration. When
>>> setting "#define INET_DRV_DEBUG 1" in
>>> "./erts/emulator/drivers/common/inet_drv.c" and "#define
>>> ERTS_POLL_DEBUG_PRINT" in "./erts/emulator/sys/common/erl_poll.c"
>>> all the debugging output looks the same when exercising the file
>>> descriptors in the same way.  The only difference seems to be the
>>> "Entering erts_poll_wait(), timeout=NUMBER" output has non-zero
>>> timeout values more often when prim_inet:ignorefd/2 is not used
>>> when compare to the output when it is being used.  Also, the
>>> NUMBER doesn't seem to go about 1000 for me when
>>> prim_inet:ignorefd/2 is used, but it can go above 1000 when
>>> prim_inet:ignorefd/2 is not used.
>>>
>>> Why would the erts_poll_wait() loop be refusing to timeout?  Is
>>> this expected behaviour?  Is there an erts configuration flag
>>> which is meant to address the problem?
>>>
>>> Thanks,
>>> Michael
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions