[erlang-questions] fast tuple sorting

Thu Jun 12 13:26:55 CEST 2014

# Dmitry Kolesnikov 2014-06-12:
> [...]
>
> There is a flow of request to the system. Each request is set of integer
> on interval [0, 2^32]. The set cardinality varies from 100 to 300 elements.
> Some of those integers has a special meaning, they are associated with list
> of tuples in format {integer(), float()}. The length of tuple lists over
> 200 elements and there are finite number of those key integers, about ~6M.
> The system filters ?key integers? from incoming request. For each "key
> integer" it retrieves a list of tuples associated with it and returns
> this list to client. The list of tuples has to be sorted by first element.
> It is very typical scenario that resulting list is 30k - 50k tuples.
> As I said previously, the timing criteria is hard 6ms to sort the list of
> tuples.
>
> Profiling says that sorting is bottleneck.      
> 
> [...]
>
> I am afraid that problem is bond to data copy.  
> 
> The best possible option I came out is
>  * sort tuple lists before they written to ets bag
>  * use merge sort to build resulting list.

Yes, absolutely -- you don't won't to burn cycles doing the same
computations over and over and over again. :-)

Presumably the first element in your {Integer, Float} tuples fits
a fixed width (32bit or 64bit), in which case you can encode the
whole list as a packed binary: <<Integer:32, Float:64/float, ...>>,
this will avoid all the ETS copying which is probably hurting you
quite a bit.

You'll need a custom merge sort to work with that, but that's okay
and you should be able to spread the whole sorting business across
multiple cores (this, and the ordering requirement implied below,
would require slightly more detailed look then I'm prepared to give
it right now, but I trust it will prove straightforward). This is
also a good place to cheat with impunity and accelerate things with
a simple NIF if necessary.

You mention fairly huge lists of results, so hopefully you're not
actually building them in memory, then encoding to ouput format and
sending back in bulk; but rather encoding and sending out result
chunks as they become available. If the protocol you're using doesn't
allow for such mode of operation that's a flaw that should be fed
back into the design process if at all possible.

Just to be a bit clearer on previous point, this is the signalling
sequence you probably want:

    Cli 	Srv
    -+-----------+-
     |           |
     +-----------> lookup_initiate(Ref, [Integer]).
     |           |
     <-----------+ report_results(Ref, 1, [{Integer, Float}]).
     |           |
	  ...
     |           |
     <-----------+ report_results(Ref, N, [{Integer, Float}]).
     |           |
     <-----------+ lookup_terminated(Ref).

That reduces resources used by the server during lookup; whilst also
fooling with client's perception of operation latency in a good way.

Hope these ideas are pertinent and not duplicate, I wasn't following
this thread very closely.

BR,
	-- Jachym