locked up system using :ets.match_object

Sat Jan 18 08:01:38 CET 2020

 The table is a mnesia table so ets:info/2 does not seem to work.  I narrowed it down and it seemed to indeed be match_object just costing too much cpu time and perhaps locking the table. Ended up rewriting the table scanning algo (instead of match_object running around 100 * 2000 times, dump full table once and use Process dictionary to manipulate / filter / organize) and building a cache.
The runtime seems stable, it would still be interesting to diagnose those locks does mnesia have something similar to ets:info/2 ?
    On Friday, January 17, 2020, 03:06:07 p.m. EST, Sverker Eriksson <sverker@REDACTED> wrote:  

 Have you tried without read_concurrency?
What does ets:info(T, stats) after running for a while?

On fre, 2020-01-17 at 19:27 +0000, Vans S wrote:

I really want to measure this so I can have some facts, IMO the performance is degrading way too much for such a small workload.  The frequency is these 3000 processes do 1 write to the table every 15 minutes, so about 3.3 writes per second. (as the processes start at different times). The processes match_object on the table about 30000 times per second, but in bursts, so 10 operations can happen in a single function then it would back off for a few seconds or more.    On Friday, January 17, 2020, 02:20:05 p.m. EST, Sverker Eriksson <sverker@REDACTED> wrote:  

 On fre, 2020-01-17 at 20:09 +0200, Led wrote:

I am having some performance trouble in a system that does a few queries on a small ets table of around 10,000 records.

Basically with around 500 concurrent processes, everything is fine, 1500 I start to notice some small degradation, at around 3000 concurrent processes the schedulers grind to a halt, TOP system CPU usage is around 50%, but Erlang scheduler usage (scheduler:utilization) is 100% and capped out on all 40 threads.

I am guessing the schedulers are all waiting on locks on the ets table.  I thought match_object and ets was quite optimized these days, using R22, I am wondering if there is some synchronization/locking issues that could be addressed.  Because I mean at 3000 processes maybe hitting that table 10 times per second on average, does not seem like much. 30k match_objects per second, with ongoing inserts. 

Also would there be a way to debug/pinpoint this is the exact issue?  I just did A/B testing where I turned off parts of the system, when I turned off the part that does the match_objects on the ETS table, the system ran fine and never deadlocked at 100% scheduler usage.  Its also hard to profile, as the system is so locked up the profiler barely runs.

For now it seems the solution is to rework the architecture and put a second cached view ETS table, so the match_objects can be replaced with key lookups.  Which gets filled by a single process running that pulls via match_object from the main table and fills the cache.

You didn't specify parameters of your table.

And what's the frequency of those inserts that you mention.
ets:match_object is a read-only operation and should only inflict lock contention with other write operations, such as ets:insert.

/Sverker

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20200118/3249cea0/attachment.htm>