[erlang-questions] eep: New gen_stream module
Jay Nelson
jay@REDACTED
Mon Dec 10 07:53:31 CET 2007
I just posted a formal eep for a new module. The reference
implementation is on my website at http://www.duomark.com/erlang/
proposals/gen_stream.html which contains a description of the module,
some sample code to call it, and links for the original eep text and
a tar.gz file with the implementation and unit test code.
Apparently between my mailer and editor I got the number of columns
wrong and/or left in some tabs as it doesn't wrap properly on the
mailing list. I will attempt to fix those anomalies with the first
revision.
As developed and described, the implementation provides a simple
method for obtaining a binary serial stream in "chunks". The module
is intended to be used when a raw file is too big to fit in memory,
or an accumulated binary can be handled more efficiently a piece at a
time. I included a technique for generating a serial stream as a
behaviour. It can be used to generate non-binary serial streams, but
everything is carefully controlled to be a true _serial_ stream, so
it is not recommended that you attempt to use the behaviour to create
random access streams or other types of streams. The behaviour is
limited to a simple API for ease of use and for consistency with the
binary and file approaches.
In addition there is an option to make fixed-size streams circular so
that they can be used for infinite or repeatable test streams. If a
stream has an indeterminate size (an arbitrary term rather than an
integer), it may not be made circular.
The main benefit is that you can write your code to consume a stream
in bite-size chunks using 1 process and 1 buffer. Once it is working
as you intend, changing the gen_stream:start_link options allows you
to measure performance and memory usage with multiple chunking
processes, multiple buffers per process, or both. In addition, the
chunk_size may also be varied declaratively. This approach allows a
running system to change performance characteristics without the
delivery of modified code, or to adapt to changing conditions in real
time.
The restriction that enforces the "serial" nature of the stream must
hold consistent regardless of how many processes are used. Changing
the options should not change the nature of the stream other than how
large each chunk is (and hopefully how responsive the system is to
requests for the next_chunk). This is why you must be careful in
attempting to bend the results when using a behaviour module.
Any and all suggestions, comments, ridicule or abuse are graciously
welcomed. Coding tips are especially hoped for as this the very
first public airing of the module. Anyone willing to do performance
comparisons is encouraged to post them and comment on the behavior
observed. I have no information on whether the hoped for performance
characteristics have been achieved.
jay
More information about the erlang-questions
mailing list