[erlang-questions] Need help with async disk IO and thread pool on many devices (more than 1 Gbit/s )

Patrik Nyblom <>
Wed Apr 3 11:19:14 CEST 2013


On 04/03/2013 10:56 AM, Max Lapshin wrote:
> You are trying to solve another problem.
>
> Hardware can _always_ be bad. But whole system must not go down if one 
> disk is slow.
>
> Currently, whole erlang is becoming useless if only one hard drive 
> from 16 doesn't respond.
>
> This is the problem.
Wouldn't you rather need to have one async pool per device? Or the 
option to bind certain files to specific async threads? The thing with 
dynamic async threads is that the implementation relies on requests for 
the same file descriptor being routed to the same thread for all of the 
descriptors lifespan. If the async thread hangs, other file descriptors 
already routed to that thread have to wait. You would need to have the 
"good" mapping between threads and devices from the beginning.

Something like

open_a_file(Name) ->
     PoolID = get_pool_id_from_name(Name),
     {ok, FD} = file:open(Name,{async_pool_id, PoolId}),
     FD.

and get_pool_id_from_name/1 will look at the file name, figure out the 
device and return a pool id that is only used for that particular 
device? It could even be a specific thread id - you could always find 
out how many async threads you have, know how many devices you have and 
round robin files between the threads that should handle that particular 
device. In that case you would only need to add an interface where you 
can specify the thread when opening the file (and remenber which thread 
is connected to each port). Sounds like a fairly feasible hack...

On a side note, are you using sendfile? That might at least make the 
ongoing connections run smooth while another device is blocked... Not 
that it addresses the real problem, but just wondering if you have tried 
using it - it might be good for performance in your application.

Another, less compelling, solution would be to route only the writes 
through another process, so that you open a small "writer program" and 
pump the data to it through the spawn driver. Not a general solution 
either, but much simpler than to hack the file driver...

The async thread pool leaves a lot to desire - we have planned to 
replace it with something called "dirty schedulers" (internal name...), 
which is in our roadmap for R17. In the end they should solve a lot of 
the problems with the current implementation. Not that that helps you 
now, but it makes me wonder if not a "dirty" solution might be the right 
way to solve it for the moment, as another solution is in the pipe...
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
Cheers,
/Patrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130403/56749e52/attachment.html>


More information about the erlang-questions mailing list