[erlang-questions] Troubling gen_tcp.send/3 performance

Matthew Shapiro me@REDACTED
Sun Dec 4 04:42:14 CET 2016


On a whim (sorry probably should have tried this earlier) I used ffmpeg as
my playback client instead of VLC.  For some reason when using ffmpeg
there's zero backlog shown in observer (even with send each packet
immediately) and the I/O chart in the load tab has the Output line exactly
tracking the input graph (which is what I would expect for a 1 in 1 out
connection.

So I guess VLC is doing something funky on localhost connections that is
making prim_inet.send/3 to go into waiting.

On Sat, Dec 3, 2016 at 9:45 PM, Matthew Shapiro <me@REDACTED> wrote:

> Well this is on windows 10, and all applications are running locally on
> 127.0.0.1.  I suppose I could throw this on a linux box somewhere and test
> to see if localhost is just broken.
>
> On Sat, Dec 3, 2016 at 9:36 PM, <felixgallo@REDACTED> wrote:
>
>> Another avenue you could check would be your operating system, nic
>> driver, and the physical layer. For example, I traced issues with symptoms
>> much like yours to buffer problems inside a consumer wifi access point.
>> It's possible there is a misconfiguration or limitation somewhere between
>> you and the receiver.
>>
>> Anecdotally, Erlang is commonly used to ship many hundreds of concurrent
>> video streams from a single box, so your failure at 1 stream is not
>> expected.
>>
>> F.
>>
>> On Dec 3, 2016, at 5:47 PM, Matthew Shapiro <me@REDACTED> wrote:
>>
>> I posted this question in the Elixir forums a day or so ago, but I wanted
>> to put it here as well to gain visibility by people who have more
>> experience with the internals of Erlang, since my question is related more
>> to the Erlang libraries rather than Elixir itself.
>>
>> ## Summary
>>
>> I am trying to create a media streaming server in Elixir, with an initial
>> focus on RTMP publishing and playback.  I chose Elixir/Erlang because it
>> seemed like a perfect candidate but I seem to be having trouble.
>>
>> The testing setup is 3 applications, 1 RTMP publisher (3rd party OBS
>> studio), 1 RTMP viewer (VLC), and my Elixir server.  Both the publisher and
>> viewer connect to my elixir server over localhost, the publisher sends the
>> elixir server video and audio data and each packet gets relayed off to the
>> viewer, all over TCP.  The publisher is currently set to send 2500kbps, and
>> network traffic shows it pretty close to this.
>>
>> When running the test I notice the video is stuttering a lot.  VLC debug
>> messages show it's receiving frames inconsistently and trying to compensate
>> for it.
>>
>> After getting help from people in IRC and looking through observer, I
>> think I have pretty much pinpointed the issue to the `:gen_tcp.send()`
>> calls being slow, so slow in fact I have observed up to 5-10 seconds just
>> to push out an individual send call.
>>
>> Since i know Erlang is heavily used in switches I can't believe that this
>> performance I"m getting is normal.  Lowering my video's bitrate to 500kbps
>> does show smoother playback but I can still tell there is an issue.
>>
>> For reference, the code I have so far is up at
>> https://github.com/KallDrexx/mmids-temp.  Note that this is a temporary
>> repository, I plan to split each of hte apps up into their own
>> repositories, slap an MIT license on them, then upload them to hex once I
>> have this thing stabilized.
>>
>> Based on diagnostics I coded a 2500kbps video is averaging 200-250
>> messages per second going from the publisher to the viewing client.
>>
>> # What is the architecture?
>>
>> The general architecture I have right now is that when any type of client
>> connects I utilize `ranch` to spawn a `gen_server`.  This server receives
>> TCP binary (using `active_once` and `raw` flags), attempts to deserialize
>> any RTMP messages contained in it, react to messages that can/should be
>> reacted to, and respond with any responses back to the client.  This all
>> occurs within a single `gen_server` and no other processes are involved.
>>
>> For demonstration purposes when a viewing client requests playback I use
>> `pg2` to subscribe to a specific channel for audio and video data.
>> Publishing clients that are publishing a/v data on that same stream key
>> push that data to all subscribed clients.  The viewing clients then receive
>> the a/v data, serialize them into RTMP messages, serialize them into
>> binary, then send them off across the network pipe.
>>
>> # What have I tried?
>>
>> First I tried utilizing `:os.system_time(:milli_seconds)` to determine
>> how long any audio/video data packet took from deserializing from the
>> publisher to right before binary serialization of the client.  I noticed
>> that it would start out extremely fast and then pauses would occur (long
>> 5-10 second pauses) and then batches of packets would get processed, then
>> another pause, etc...
>>
>> Then I was reminded about observer, and I loaded it and saw the following
>> graph: https://dl.dropboxusercontent.com/u/6753359/observer1.PNG.  The
>> I/O graph told me that while inbound traffic was smooth, outbound was being
>> staggered.
>>
>> I then opened the process for the server managing the viewing client.  I
>> noticed the message queue length was constantly increasing, never
>> decreasing, and the process was constantly stuck in the `prim_inet:send/3`
>> function.
>>
>> In doing some Googling I came across [this thread](http://erlang.2086793.
>> n4.nabble.com/why-is-gen-tcp-send-slow-td2106954.html) talking about
>> slow `send()` performance, and while it didn't have a definite fix it did
>> mention batching up the binary for the send() call so I wasn't calling it
>> 200 times every second.
>>
>> The first thing I tried was to utilize a timer.  Instead of calling
>> `send()` every message I put the binary in an iodata queue held in the
>> gen_server's state.  I then added `:timer.send_interval(100, :send_queue)`
>> to my initialization thinking I could send data once every 100ms.
>>
>> This did not give any better results outside of managing the message
>> queue better.  What I noticed with observer and this timer was odd in that
>> I would keep pressing the refresh hotkey and I would see my queue keep
>> growing for up 5-10 seconds, and then go down to zero again.  This repeated
>> over and over, and every refresh it was still stuck on `prim_inet:send/3`.
>> This seems to me that send is just taking a ridiculous amount of time.
>> Changing the timer interval up or down did not really help noticably.
>>
>> The next thing I tried was to stop the interval and send every X times I
>> try to send a message, allowing me to batch messages together but make
>> smaller batches then the interval method caused.  This didn't help by a
>> noticeable amount either, and was worse for managing the message queue.
>>
>> Finally I tried tweaking the watermark values (even having them go up to
>> 64k) but I could not stop prim_inet:send/3 from causing my process to wait
>> upwards of 15 seconds.
>>
>> # So what now?
>>
>> I'm not quite sure how to proceed from here.  I can't believe that
>> sending data via TCP is really that bad for a VM that I hear so many low
>> latency and soft-realtime praise for.
>>
>> At the end of the day when the final final system is built I am hoping to
>> get 50 inputs sending data to 150 outputs (based on current performance
>> I've seen from other third party products), each connection (in and out)
>> dealing with around 3Mbps of audio/video data.  So it's a bit disconcerning
>> that I can't even get 1 in 1 out working reliably.
>>
>> Does anyone have any advice on where I go from here?
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161203/c4a6b192/attachment.htm>


More information about the erlang-questions mailing list