Port locks with high time under LCNT

Valentin Micic v@REDACTED
Sat Jan 4 09:17:22 CET 2020


Just my 2c (for what is) worth…

There are some versions of Linux kernel that are known to cause performance problems. Recently, we had a situation where a particular OS kernel patch caused a serious performance degradation. We were lucky, as the same software ran on multiple machines with almost the same hardware configuration, thus we did not have to doubt Erlang, or our code to that matter. 
BTW: the worst performing machine was running at 30% capacity relative to the best performing machine. After a brief wild goose chase with a hardware vendor, they (the vendor) have established that the problem was not related to the Intel CPU (Speedstep Technology), as *we* initially suspected, but actual OS kernel version.

When you become desperate enough, you could check your OS kernel version/patch level as well.

V/ 

> On 03 Jan 2020, at 19:39, Frank Muller <frank.muller.erl@REDACTED> wrote:
> 
> Hi Daniel
> 
> Thanks for the feedback. Sorry for being unclear, but this thread discontinued when switched to 2020. I gave some context when I started it at:
> http://erlang.org/pipermail/erlang-questions/2019-December/098910.html <http://erlang.org/pipermail/erlang-questions/2019-December/098910.html>
> 
> More context:
> 1. I’m building a Hugo (https://gohugo.io/ <https://gohugo.io/>) like service in Erlang.
> Actually, it’s much more simpler than Hugo. You ask for a file, and you get it (or not).
> 
> 2. The design is simple. One Erlang process handles one incoming connection.
> Thus, it’s not possible to mess up with TCP sockets here. Whether the file exists and gets delivered to the caller or not.
> 
> 3. Average file size is ~150KB. Max file size: ~3MB
> 
> 4. I’m using Erlang 22.2 on both Linux (prod system) and MacOS (dev).
> 
> 5. The service is caping at 4'000 req/sec.
> 
> 6. @Mikael Pettersson suggested to use prim_file:sendfile/8 which gaves us a boost of ~3% in throughput.
> 
> 7. For now, I’m only using HTTP (no HTTPS).
> 
> 8. Linux sysctl settings are good and tuned (https://medium.com/@pawilon/tuning-your-linux-kernel-and-haproxy-instance-for-high-loads-1a2105ea553e <https://medium.com/@pawilon/tuning-your-linux-kernel-and-haproxy-instance-for-high-loads-1a2105ea553e>):
> $ ulimit -n
> 655360
> 
> Anything else you’d like to know?
> 
> I completely agree with you regarding the Port locks. But when 1 collision occurs, it tooks 72’690'401usec.
> That’s my concern :-/
> 
> Can these Port locks be avoided?
> Is there any undocumented feature I can try?
> 
> I’m open to test any idea and/or change my design architecture if needed. 
> 
> Thanks in advance.
> 
> /Frank
> 
> 
> Fri 3 jan 2020 at 10:21, Dániel Szoboszlay <dszoboszlay@REDACTED <mailto:dszoboszlay@REDACTED>> wrote :
> Hi Frank,
> 
> I think some context would help us to better understand your problem. For example what kind of problems do these lock conflicts cause in your system? Are there some Erlang processes that got stuck for several seconds? What are they trying to do at that time? Or is it only that the throughput you experience stays below your expectations?
> 
> The lock stats in the gists don't seem that wrong to me. I don't have much experience with lock counters and can't figure out from the gists what do these ports are (the image files? tcp connections?) and how they are used, but my guess is they are the tcp connections. Each port lock is tried 4 or 5 times and there's only 1 (albeit often very long) collision. This is how I'd expect a web server to handle a tcp connection: do one read, one write, close it - a handful of operations only. And if you have an unresponsive client or bad network, one operation may easily take 6-7 seconds, blocking the next one. So just a guess, but is it possible you attempt to use the same tcp socket from two different processes?
> 
> Cheers,
> Daniel
> 
> On Fri, 3 Jan 2020 at 00:40, Frank Muller <frank.muller.erl@REDACTED <mailto:frank.muller.erl@REDACTED>> wrote:
> The more load we have, the more locks contention we get:
> 
> https://gist.github.com/frankmullerl/7fb9470e22869312d97011c0faf0046b <https://gist.github.com/frankmullerl/7fb9470e22869312d97011c0faf0046b>
> 
> /Frank
> 
> 
> Hi Mikael
> 
> Thanks for pointing out prim_inet. To my knowledge, it’s undocumented I can be changed by the OTP team. That’s why I’ve avoided it at the first place. 
> 
> By the way, you probably mean prim_inet:sendfile/4 not 8, correct?
> 
> I made a quick the change to my app and found that the throughput is a little bit better (3% faster).
> 
> But these port locks are still there :-/
> Any other ideas?
> 
> /Frank
> 
> On Thu, Jan 2, 2020 at 5:38 PM Frank Muller <frank.muller.erl@REDACTED <mailto:frank.muller.erl@REDACTED>> wrote:
> >
> > Thanks to @Peti Gömöri, I was able to identify these Port locks:
> > https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5 <https://gist.github.com/frankmullerl/008174c6594ca27584ac7f4e6724bee5>
> >
> > Some of them are taking up to 6.7sec (6707846 usec) :-/
> >
> > My application is serving static images by calling file:sendfile/2 (https://erlang.org/doc/man/file.html#sendfile-2 <https://erlang.org/doc/man/file.html#sendfile-2>).
> >
> > Can someone please explain how I can avoid these locks or at least make their impact lesser?
> 
> Using the file module has been known to cause synchronization
> overheads.  Often we (Klarna) use prim_file instead,
> but it has sendfile/8 not /2.  Anyway, that may be an avenue worth
> investigating.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200104/a04aac89/attachment.htm>


More information about the erlang-questions mailing list