<div dir="ltr"><br><div>Hi.  Will try to share our experience with hunting one strange performance bug.</div><div><br></div><div><br></div><div>One our (Flussonic) customer asked us to help because his web UI panel started working very slowly after update.  Short look in network inspector of Chrome showed that blocker was a HTTP API call to method that parses config.</div><div><br></div><div>Our config parser (generated by sub-optimal neotoma grammar) isn't blazing fast, but small config was parsed during 3 minutes (!)   3 minutes of running erlang with some ets calls and lot of code.  Usually such amount of text is parsed during 1 second.</div><div><br></div><div><br></div><div>We've logged on server and in console have seen that such API call makes something with erlang and during several minutes there are 400-700 active processes:</div><div><br></div><div>> erlang:statistics(total_active_tasks).</div><div>740</div><div><br></div><div>When such schedule storm comes  CPU used by flussonic is lowered from 1000% to 400%</div><div><br></div><div>I will tell in other post how we have succesfully used lcnt, but here is a story about pure in-erlang performance degradation.</div><div><br></div><div><br></div><div>After several hours of bisecting flussonic versions and minor commits we have found commit that broke everything.</div><div><br></div><div>Flussonic on that server was working with moderate amount of cameras (350 IP cameras via protocol RTSP) and 500 mbit/s of input traffic.  This data flow is recorded on disks and restreamed to several clients (maybe 50 mbit/s of output).</div><div><br></div><div><br></div><div>Some versions ago there was code that activated sockets with  {active, N}  but this code was removed (don't remember why have I done it, but git blame ruthlessly tells that I did it) and replaced by {active, once}</div><div><br></div><div><br></div><div>We have immediately replaced this code by another tricky version:</div><div><br></div><div><br></div><div>handle_info({tcp, _Socket, Bin}, #rtsp{} = State) -></div><div>   State1 = handle_data(Bin, State),</div><div>   erlang:send_after(100, self(), activate_tcp),</div><div>   {noreply, State1};</div><div><br></div><div>handle_info(activate_tcp, #rtsp{socket = Socket} = State) -></div><div>   inet:setopts(Socket, [{active,once}]),</div><div>   {noreply, State}</div><div><br></div><div><br></div><div><br></div><div>This simple lines of code immediately made everything works ok:  ~900% of CPU and immediate response to API.</div><div><br></div><div><br></div><div>Here is our interpretation of this situation.</div><div><br></div><div>When I changed activation to once, rtsp socket was sending one message per 1460 bytes (just as IP camera sends). The trick here is that all IP cameras are sending into TCP socket data in the same was as they send in UDP socket. All video frames are fragmented into pieces so that they can fit into MTU.</div><div><br></div><div>So when IP camera makes   write(fd, data, len),  len is always not more than 1460. IP camera tries to send not large bursts, but rather regular packet flow.</div><div><br></div><div><br></div><div>So, flussonic started receiving 500 mbit / 1460 bytes = 40 K of messages per second.</div><div>About 120 messages per second per each process on 24 core Xeon server.</div><div><br></div><div>After adding this simple trick with later activation we give gen_tcp socket some time to fill buffer and average data size raised from 1460 to 50000 bytes.  So amount of messages reduced to 2000 per second.</div><div><br></div><div><br></div><div><br></div><div><br></div><div>Perhaps when I changed to active,once  we've got situation that each rtsp handling process was rescheduled immediately after calling  inet:setopts(Socket, [{active,once}]) because data has already arrived during handling previous packet.  This rescheduling storm blocks time for all other processes. When we increased average size of packet situation became better because each rtsp handling process started working more time in actual reductions and less amount of time in scheduling.</div><div><br></div><div>So looks that batch handling of data is cheaper than often scheduling.</div><div><br></div><div><br></div><div>What is unclear for me:  why was  active,N  better than active,once?   I haven't checked but I think that still average size of tcp message and amount of messages was the same as in  active,once case.</div><div><br></div><div><br></div><div><br></div><div><br></div><div>Next post will be about locks and ets tables. It is really interesting!</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div>


</div>