[erlang-questions] why is mmap so darn difficult

Tim Watson watson.timothy@REDACTED
Wed Jun 12 10:35:33 CEST 2013


On 12 Jun 2013, at 09:02, Max Lapshin wrote:
> Of course, mmap is not a fast write.
> 

I think the idea was not to flush from application code, but rely on the OS pagecache to gain some performance benefits at the cost of reliability (since without fsync, data loss is possible).

> Look at mongodb: it becomes absolutely unpredictable database, when
> you need to write data due to mmap.
> 
> You never know how much time write will take.
> 

Fascinating observation. But again, isn't that only a problem for individual writes? I'm not arguing for one thing or another here, just trying to understand the variables. If I want to write to write a stream to disk as fast as possible, isn't letting the OS deal with paging/caching/flushing likely to offer better overall throughput?

> So, file:pwrite, file:pread are ok. Don't be afraid of them.

We already have code that does regular reads/writes but the usage pattern is currently quite random, and throughput isn't in the 100s of Khz, though that's not entirely due to file I/O either. Part of the problem space I'm looking at is that the data store here might be terra-bytes large, so the assumption is/was that mmap would offer real benefits for that situation, as it appears to do for e.g., varnish, apache-kafka, etc. There are other possibilities of course, such as embedding http://symas.com/mdb/ - here is a product using mmap that claims phenomenally high throughput for concurrent readers and writers. Of course it's perhaps not so easy to take data out of that and send it to a client with sendfile, but perhaps *regular* sockets writes are fine. There is a driver for that database going around, but it's a NIF and therefore I'm not convinced it's usable due to the effect a long call can have on scheduling. Of course a linked in driver can share binaries with the emulator and process things out of bound using the async thread pool, then send them back to any process in the emulator (without copying) using erl_drv_send_term.

Perhaps that's a better choice than trying to go low-level and use mmap directly, but it does raise the question why that software is able to get such good "headline rates" using mmap, when others (such as mongodb, cited) do not.

Cheers,
Tim


More information about the erlang-questions mailing list