Reducing number of copies of bulk network data

Tue Feb 1 11:17:53 CET 2000

Jim Larson <jim@REDACTED> writes:

> 
> Hello!
> 
> I'm working on an Erlang application that needs to handle a large
> bandwidth of network data, multiplexing it among a variety of other
> (UNIX) processes, attached over TCP connections on the loopback
> device.
> 
> We use binary sockets to get the bulk data into Erlang binary form,
> thus avoiding copies during intra-Erlang message-passing.
> 
> When analyzing the performance of the application, I've noticed
> that the current Internet socket driver (erts/emulator/drivers/common
> /inet_drv.c) copies the data once during reception and once during
> sending.
> 
> UDP reception works fine, receiving data directly into its Binary
> buffer.
> 
> TCP reception, at least when packetizing is turned on, makes one
> copy of the data.  Is there any way to eliminate this copy?

I know no obvious way to eliminate the copy. It might be possible
by making the code in the inet driver smarter and more complicated.

We have received some extensive patches to the inet driver and other
parts of the runtime system. The patches will (among other things) eliminate
the erlang processes currently needed to use sockets. We are investigating
this patch to see if we can include it in R7. Before we've done that
we don't want to make any changes to the inet driver which makes
it more complicated.

> 
> In our application, the binary received from the network is split
> and reassembled into a new packet which is then sent back out over
> another interface.  However, for anything but a single binary, the
> runtime system makes a copy of the I/O list into a contiguous
> buffer, then passes that buffer to the driver.  Is there any way
> to eliminate this copy?
> 
> I've noticed an "outputv" driver entry point which seems to accept
> an iovec argument, instead of a simple pointer and length.  This
> would help our application tremendously.  Is this entry point
> well-supported in the runtime system?  Is it mature enough for a
> driver to use it?

Yes, it should be. We have used it in some drivers.

> 
> We'd eventually like to use shared memory to communicate with the
> clients of our Erlang application that are running on the same
> machine.  To get true zero-copy, we'd need to be able to:
> 
> 	- incorporate buffers from an mmap()'ed file as Erlang binary
> 	  objects;
> 
> 	- receive data from a network socket directly into a buffer
> 	  in mmap()'ed space;
> 
> 	- allocate all new binary objects, or buffers created by
> 	  the runtime system as concatenations of byte lists, into
> 	  buffers in mmap()'ed space.
> 
> The easiest way to do this seems to be to:
> 
> 	- modify the runtime system's malloc() wrappers to call
> 	  malloc() replacements that use shared memory (note that
> 	  this puts the entire Erlang heap into shared memory,
> 	  which may be extreme);

The tagging scheme currently used dictates that heaps must be located
in the lower one Giga-bytes of memory (or one Gb in some other part of
the memory if you use EXTRA_POINTER_BITS).

We are considering switching to a two-bit tagging scheme (allowing
access to all memory), but it is unlikely what it will be done in R7.

> 
> 	- create a new driver which can incorporate shared memory
> 	  buffers as new Erlang binaries.

This seems feasible. It would only require minor changes to the
runtime system to ensure that mmap'ed binaries are deallocated correctly
when their reference counts reach zero.

> 
> Are there any other ideas on how to do this?
> 
> Lastly, will the upcoming binary syntax bring along an iovec-style
> internal representation of binaries?  This would allow us to
> concatenate binaries with zero copies.

Binary syntax and binary representation are actually separate issues.
Our prototype implementation of the binary syntax (in R7) uses the "good old"
reference-counted binaries.

Our plans for binary representation are as follows:

We plan to add heap binaries (small binaries up to, say, 64 bytes) will
be stored on the heap and sub binaries (small heap based objects that can
point to part of a reference-counted binary or heap binary). Actually,
we did most of the hard work in R6B by changing the representation of
binaries to include sub-tags.

We think that the segmented binaries as implemented in the original binary
syntax implemenation are too complicated to be worth doing. As an alternative,
we are considering "lazy concatenation" of binaries: not concatenating binaries
until really needed. This will probably not be done in R7.

BTW, are you aware that you can combine binaries by simply building a list
of the binaries? You can send the list directly to a port, and if the driver
supports the outputv entry point, there will be no copying of the binaries.

> 
> Thanks,
> 
> Jim Larson
> jim@REDACTED
> 

/Bjorn
-- 
Björn Gustavsson            Ericsson Utvecklings AB
bjorn@REDACTED      ÄT2/UAB/F/P
			    BOX 1505
		 	    125 25 Älvsjö