[erlang-questions] Troubling gen_tcp.send/3 performance

Matthew Shapiro <>
Sun Dec 4 02:47:47 CET 2016

I posted this question in the Elixir forums a day or so ago, but I wanted
to put it here as well to gain visibility by people who have more
experience with the internals of Erlang, since my question is related more
to the Erlang libraries rather than Elixir itself.

## Summary

I am trying to create a media streaming server in Elixir, with an initial
focus on RTMP publishing and playback.  I chose Elixir/Erlang because it
seemed like a perfect candidate but I seem to be having trouble.

The testing setup is 3 applications, 1 RTMP publisher (3rd party OBS
studio), 1 RTMP viewer (VLC), and my Elixir server.  Both the publisher and
viewer connect to my elixir server over localhost, the publisher sends the
elixir server video and audio data and each packet gets relayed off to the
viewer, all over TCP.  The publisher is currently set to send 2500kbps, and
network traffic shows it pretty close to this.

When running the test I notice the video is stuttering a lot.  VLC debug
messages show it's receiving frames inconsistently and trying to compensate
for it.

After getting help from people in IRC and looking through observer, I think
I have pretty much pinpointed the issue to the `:gen_tcp.send()` calls
being slow, so slow in fact I have observed up to 5-10 seconds just to push
out an individual send call.

Since i know Erlang is heavily used in switches I can't believe that this
performance I"m getting is normal.  Lowering my video's bitrate to 500kbps
does show smoother playback but I can still tell there is an issue.

For reference, the code I have so far is up at
https://github.com/KallDrexx/mmids-temp.  Note that this is a temporary
repository, I plan to split each of hte apps up into their own
repositories, slap an MIT license on them, then upload them to hex once I
have this thing stabilized.

Based on diagnostics I coded a 2500kbps video is averaging 200-250 messages
per second going from the publisher to the viewing client.

# What is the architecture?

The general architecture I have right now is that when any type of client
connects I utilize `ranch` to spawn a `gen_server`.  This server receives
TCP binary (using `active_once` and `raw` flags), attempts to deserialize
any RTMP messages contained in it, react to messages that can/should be
reacted to, and respond with any responses back to the client.  This all
occurs within a single `gen_server` and no other processes are involved.

For demonstration purposes when a viewing client requests playback I use
`pg2` to subscribe to a specific channel for audio and video data.
Publishing clients that are publishing a/v data on that same stream key
push that data to all subscribed clients.  The viewing clients then receive
the a/v data, serialize them into RTMP messages, serialize them into
binary, then send them off across the network pipe.

# What have I tried?

First I tried utilizing `:os.system_time(:milli_seconds)` to determine how
long any audio/video data packet took from deserializing from the publisher
to right before binary serialization of the client.  I noticed that it
would start out extremely fast and then pauses would occur (long 5-10
second pauses) and then batches of packets would get processed, then
another pause, etc...

Then I was reminded about observer, and I loaded it and saw the following
graph: https://dl.dropboxusercontent.com/u/6753359/observer1.PNG.  The I/O
graph told me that while inbound traffic was smooth, outbound was being

I then opened the process for the server managing the viewing client.  I
noticed the message queue length was constantly increasing, never
decreasing, and the process was constantly stuck in the `prim_inet:send/3`

In doing some Googling I came across [this thread](
talking about slow `send()` performance, and while it didn't have a
definite fix it did mention batching up the binary for the send() call so I
wasn't calling it 200 times every second.

The first thing I tried was to utilize a timer.  Instead of calling
`send()` every message I put the binary in an iodata queue held in the
gen_server's state.  I then added `:timer.send_interval(100, :send_queue)`
to my initialization thinking I could send data once every 100ms.

This did not give any better results outside of managing the message queue
better.  What I noticed with observer and this timer was odd in that I
would keep pressing the refresh hotkey and I would see my queue keep
growing for up 5-10 seconds, and then go down to zero again.  This repeated
over and over, and every refresh it was still stuck on `prim_inet:send/3`.
This seems to me that send is just taking a ridiculous amount of time.
Changing the timer interval up or down did not really help noticably.

The next thing I tried was to stop the interval and send every X times I
try to send a message, allowing me to batch messages together but make
smaller batches then the interval method caused.  This didn't help by a
noticeable amount either, and was worse for managing the message queue.

Finally I tried tweaking the watermark values (even having them go up to
64k) but I could not stop prim_inet:send/3 from causing my process to wait
upwards of 15 seconds.

# So what now?

I'm not quite sure how to proceed from here.  I can't believe that sending
data via TCP is really that bad for a VM that I hear so many low latency
and soft-realtime praise for.

At the end of the day when the final final system is built I am hoping to
get 50 inputs sending data to 150 outputs (based on current performance
I've seen from other third party products), each connection (in and out)
dealing with around 3Mbps of audio/video data.  So it's a bit disconcerning
that I can't even get 1 in 1 out working reliably.

Does anyone have any advice on where I go from here?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20161203/e14fc8e8/attachment.html>

More information about the erlang-questions mailing list