[erlang-questions] Weird Client SSL Behavior / Performance

Mon May 21 16:45:00 CEST 2012

Hi!

We are currently working on eliminating some bottlenecks in ssl. If
you want to beta-test
I can drop you a mail privatly in a couple of days with a git-branch.

Regards Ingela Erlang/OTP team Ericsson AB

2012/5/20, John-Paul Bader <hukl@REDACTED>:
> Hey,
>
>
> recently we had to implement a service that would send about 50 - 500
> https requests per second to a 3rd party API that has an average latency
> of about 400-800ms.
>
> Therefor we had to open a lot of parallel connections to the destination
> host. First we have used lhttpc but after some time we observed really
> weird behavior. While lhttpc would use some kind of connection pooling,
> it was constantly leaking processes that were stuck in prim_inet:recv.
> Netstat showed a continously growing number of connections stuck in WAIT
> or TIME_WAIT for port 443. After some time all sockets on the machine
> were used and no communication was possible anymore.
>
> After trying out various settings for lhttpc without any improvement on
> the situation we added ibrowse instead. ibrowse behaved much nicer as it
> was not leaking processes and also did not accumulate the WAIT or
> TIME_WAIT connections that showed up before in netstat. We told it to
> open 300 connections and that was the exact amount it opened.
>
> Now this went fine like it should for about 15 or 20 minutes before _no_
> traffic was going through that 300 connections. The VM accumulated load
> and basically was not communicating with the 3rd party API.
> Interestingly enough netstat still showed that all 300 connections were
> established.
>
> We started to investigate on the true cause for this behavior. At some
> point we set up a local nginx with ssl and tried it there to rule out
> the 3rd party API. The behavior was the same.
>
> Then we tried to use http instead of https and boom - that went super
> smooth just like it should. 300 connections, minimal load on the VM, no
> leaking processes and it kept running for more than 15 minutes.
>
> Then we switched back to the real API and back to https and tried to
> figure out what part in the VM was holding us back but we could not
> really find a solution. We only saw that a lot of message were
> accumulating in the outer gen_fsm:loop of ssl after those 15 minutes.
>
> In the end, after multiple days of investigating and experiments, we set
> up stunnel on the same machine in client mode that was connecting to the
> 3rd party API. This way our erlang service just talked plain http with
> stunnel. This performs extremely well for multiple days now.
>
> This is somehow unsatisfactory though.
>
> First of all I'd really like to know why the problem occurs in the first
> place and why it creates so much load in general. Stunnel deals with the
> same load with 1/3 or even less than that of what the erlang vm needs.
> For now at least I'm a little bit underwhelmed of erlangs ssl stability
> and performance (I was bitten by the R14B03 bug as well).
>
> Secondly, lhttpc socket handling is not really that great. I'm sure it
> works fine for http or in low load https scenarios but in our case,
> leaking processes, using up all sockets and by that stopping the service
> all together was super bad.
>
> Is there anybody with similar observations or maybe even solutions?
>
>
> ~ John
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>