[erlang-questions] eep: New gen_stream module

Jay Nelson jay@REDACTED
Mon Dec 10 07:53:31 CET 2007


I just posted a formal eep for a new module.  The reference  
implementation is on my website at http://www.duomark.com/erlang/ 
proposals/gen_stream.html which contains a description of the module,  
some sample code to call it, and links for the original eep text and  
a tar.gz file with the implementation and unit test code.

Apparently between my mailer and editor I got the number of columns  
wrong and/or left in some tabs as it doesn't wrap properly on the  
mailing list.  I will attempt to fix those anomalies with the first  
revision.

As developed and described, the implementation provides a simple  
method for obtaining a binary serial stream in "chunks".  The module  
is intended to be used when a raw file is too big to fit in memory,  
or an accumulated binary can be handled more efficiently a piece at a  
time.  I included a technique for generating a serial stream as a  
behaviour.  It can be used to generate non-binary serial streams, but  
everything is carefully controlled to be a true _serial_ stream, so  
it is not recommended that you attempt to use the behaviour to create  
random access streams or other types of streams.  The behaviour is  
limited to a simple API for ease of use and for consistency with the  
binary and file approaches.

In addition there is an option to make fixed-size streams circular so  
that they can be used for infinite or repeatable test streams.  If a  
stream has an indeterminate size (an arbitrary term rather than an  
integer), it may not be made circular.

The main benefit is that you can write your code to consume a stream  
in bite-size chunks using 1 process and 1 buffer.  Once it is working  
as you intend, changing the gen_stream:start_link options allows you  
to measure performance and memory usage with multiple chunking  
processes, multiple buffers per process, or both.  In addition, the  
chunk_size may also be varied declaratively.  This approach allows a  
running system to change performance characteristics without the  
delivery of modified code, or to adapt to changing conditions in real  
time.

The restriction that enforces the "serial" nature of the stream must  
hold consistent regardless of how many processes are used.  Changing  
the options should not change the nature of the stream other than how  
large each chunk is (and hopefully how responsive the system is to  
requests for the next_chunk).  This is why you must be careful in  
attempting to bend the results when using a behaviour module.

Any and all suggestions, comments, ridicule or abuse are graciously  
welcomed.  Coding tips are especially hoped for as this the very  
first public airing of the module.  Anyone willing to do performance  
comparisons is encouraged to post them and comment on the behavior  
observed. I have no information on whether the hoped for performance  
characteristics have been achieved.

jay




More information about the erlang-questions mailing list