[erlang-questions] Troubling gen_tcp.send/3 performance

felixgallo <>
Sun Dec 4 03:36:09 CET 2016


Another avenue you could check would be your operating system, nic driver, and the physical layer. For example, I traced issues with symptoms much like yours to buffer problems inside a consumer wifi access point.  It's possible there is a misconfiguration or limitation somewhere between you and the receiver. 

Anecdotally, Erlang is commonly used to ship many hundreds of concurrent video streams from a single box, so your failure at 1 stream is not expected.

F. 

> On Dec 3, 2016, at 5:47 PM, Matthew Shapiro <> wrote:
> 
> I posted this question in the Elixir forums a day or so ago, but I wanted to put it here as well to gain visibility by people who have more experience with the internals of Erlang, since my question is related more to the Erlang libraries rather than Elixir itself.
> 
> ## Summary
> 
> I am trying to create a media streaming server in Elixir, with an initial focus on RTMP publishing and playback.  I chose Elixir/Erlang because it seemed like a perfect candidate but I seem to be having trouble.
> 
> The testing setup is 3 applications, 1 RTMP publisher (3rd party OBS studio), 1 RTMP viewer (VLC), and my Elixir server.  Both the publisher and viewer connect to my elixir server over localhost, the publisher sends the elixir server video and audio data and each packet gets relayed off to the viewer, all over TCP.  The publisher is currently set to send 2500kbps, and network traffic shows it pretty close to this.
> 
> When running the test I notice the video is stuttering a lot.  VLC debug messages show it's receiving frames inconsistently and trying to compensate for it.  
> 
> After getting help from people in IRC and looking through observer, I think I have pretty much pinpointed the issue to the `:gen_tcp.send()` calls being slow, so slow in fact I have observed up to 5-10 seconds just to push out an individual send call.
> 
> Since i know Erlang is heavily used in switches I can't believe that this performance I"m getting is normal.  Lowering my video's bitrate to 500kbps does show smoother playback but I can still tell there is an issue.
> 
> For reference, the code I have so far is up at https://github.com/KallDrexx/mmids-temp.  Note that this is a temporary repository, I plan to split each of hte apps up into their own repositories, slap an MIT license on them, then upload them to hex once I have this thing stabilized.
> 
> Based on diagnostics I coded a 2500kbps video is averaging 200-250 messages per second going from the publisher to the viewing client.
> 
> # What is the architecture?
> 
> The general architecture I have right now is that when any type of client connects I utilize `ranch` to spawn a `gen_server`.  This server receives TCP binary (using `active_once` and `raw` flags), attempts to deserialize any RTMP messages contained in it, react to messages that can/should be reacted to, and respond with any responses back to the client.  This all occurs within a single `gen_server` and no other processes are involved.  
> 
> For demonstration purposes when a viewing client requests playback I use `pg2` to subscribe to a specific channel for audio and video data.  Publishing clients that are publishing a/v data on that same stream key push that data to all subscribed clients.  The viewing clients then receive the a/v data, serialize them into RTMP messages, serialize them into binary, then send them off across the network pipe.
> 
> # What have I tried?
> 
> First I tried utilizing `:os.system_time(:milli_seconds)` to determine how long any audio/video data packet took from deserializing from the publisher to right before binary serialization of the client.  I noticed that it would start out extremely fast and then pauses would occur (long 5-10 second pauses) and then batches of packets would get processed, then another pause, etc...
> 
> Then I was reminded about observer, and I loaded it and saw the following graph: https://dl.dropboxusercontent.com/u/6753359/observer1.PNG.  The I/O graph told me that while inbound traffic was smooth, outbound was being staggered.  
> 
> I then opened the process for the server managing the viewing client.  I noticed the message queue length was constantly increasing, never decreasing, and the process was constantly stuck in the `prim_inet:send/3` function.  
> 
> In doing some Googling I came across [this thread](http://erlang.2086793.n4.nabble.com/why-is-gen-tcp-send-slow-td2106954.html) talking about slow `send()` performance, and while it didn't have a definite fix it did mention batching up the binary for the send() call so I wasn't calling it 200 times every second.  
> 
> The first thing I tried was to utilize a timer.  Instead of calling `send()` every message I put the binary in an iodata queue held in the gen_server's state.  I then added `:timer.send_interval(100, :send_queue)` to my initialization thinking I could send data once every 100ms.  
> 
> This did not give any better results outside of managing the message queue better.  What I noticed with observer and this timer was odd in that I would keep pressing the refresh hotkey and I would see my queue keep growing for up 5-10 seconds, and then go down to zero again.  This repeated over and over, and every refresh it was still stuck on `prim_inet:send/3`.  This seems to me that send is just taking a ridiculous amount of time.  Changing the timer interval up or down did not really help noticably.
> 
> The next thing I tried was to stop the interval and send every X times I try to send a message, allowing me to batch messages together but make smaller batches then the interval method caused.  This didn't help by a noticeable amount either, and was worse for managing the message queue.
> 
> Finally I tried tweaking the watermark values (even having them go up to 64k) but I could not stop prim_inet:send/3 from causing my process to wait upwards of 15 seconds.
> 
> # So what now?
> 
> I'm not quite sure how to proceed from here.  I can't believe that sending data via TCP is really that bad for a VM that I hear so many low latency and soft-realtime praise for.  
> 
> At the end of the day when the final final system is built I am hoping to get 50 inputs sending data to 150 outputs (based on current performance I've seen from other third party products), each connection (in and out) dealing with around 3Mbps of audio/video data.  So it's a bit disconcerning that I can't even get 1 in 1 out working reliably.
> 
> Does anyone have any advice on where I go from here?
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161203/8440f58c/attachment.html>


More information about the erlang-questions mailing list