[erlang-questions] eep: New gen_stream module
Per Gustafsson
per.gustafsson@REDACTED
Mon Dec 10 13:21:05 CET 2007
This is a very good idea, but there are some additional things that I
would like to have and some things which are unclear how it would work.
These comments are inline in the eep text
Jay Nelson wrote:
> EEP: XXX
> Title: gen_stream behaviour
> Version: $Revision: 14 $
> Last-Modified: $Date: 2007-12-10 07:17:01 +0200 (Mon, 10 Dec 2007) $
> Author: Jay Nelson <jay at duomark.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/plain
> Created: 09-Dec-2007
> Erlang-Version: R12B-2
> Post-History: 09-Dec-2007
>
>
> Abstract
>
> An optimized behaviour module is needed to simplify the handling of
> large streams of (typically binary) data for application
> developers.
>
>
> Specification
>
> Module name:
> gen_stream
>
> Implementation:
> A gen_server which delivers "chunks" of the stream in an
> efficient
> manner so that line-oriented processing or the handling of
> streams
> much bigger than memory (possibly even infinite) may be
> absorbed by
> an application.
>
> Behaviour callbacks:
These are not really the behaviour callbacks, but rather the interface
to the gen_stream module, I was a little bit confused by this at first,
but the code seems to indicate that the actual callbacks for a
gen_stream behaviour is:
init/3,
terminate/1,
stream_length/0,
stream_length/1,
extract_block/3,
extract_split_block/4,
extract_final_block/3,
inc_progress/2
I guess that the eep also needs to define what these functions should do
to make it possible to define gen_stream behaviours
> start, start_link as in gen_server
>
> init(Args, Options) -> Same as gen_server plus list of Options:
>
> {stream, {file, path_to_file()} |
> {binary, binary()} |
> {behaviour, atom(), ExtraArgs}}
I think it would be nice to add a fourth lightweight option:
{generator, fun(() -> {binary(), fun()} | end_of_stream)}
That is a fun which returns a binary and a new fun which will produce
the next chunk or an end_of_stream marker, but this might not fit with
the OTP framework
>
> {chunk_size, integer()} returned sub-binary size,
> default is ~8K
It would be nice to have a chunk terminator such as newline rather than
an explicit size or would this be implemented using a gen_stream behaviour?
> {chunks_per_proc, integer()} num of internal chunks,
> default is 1
> {circular, false | true} whether stream repeats, default
> is false
> {num_processes, integer()} num_processes used, default 1
>
It is not clear to me what this means. Is this the number of processes
which will communicate with the server or the number of processes that
the server will spawn?
> next_chunk(Server::pid()) -> binary() | end_of_stream
> pct_complete(Server::pid()) -> integer() | atom()
> stream_size(Server::pid()) -> integer() | atom()
can these return any atom or only specific ones e.g. 'infinite' or 'error'
> stream_pos(Server::pid()) -> integer()
> stop(Server::pid()) -> ok
>
> Usage:
> Client starts the gen_stream by providing at least a stream
> option. The stream option indicates whether the source of the
> stream is a file, a binary or a function. When using a
> socket, port or other source, the client needs to implement
> the behavior to feed the buffers on demand.
>
> Motivation
>
> There are many ways to get binary data into an erlang node,
> however, historically it has been recommended that the data be
> converted to a list and processed. There are many situations
> where leaving the binary data in its original form is preferable
> for space or conversion efficiency reasons (e.g., when merely
> filtering data in a relaying router process or when performing
> statistics on raw stream data). Providing a gen_server idiom
> makes the default approach to processing a binary stream an
> abstraction that is closer to an application developer's view of
> the problem solution.
>
> The recent Wide Finder project [1] challenged the erlang
> community by
> highlighting the slowness of standard I/O functions, forcing
> developers to use raw binary handling. This approach seems to be a
> common need in web service applications, yet it is quite easy to
> do in
> a very inefficient manner. Providing a reference implementation
> that
> exposes a simpler behaviour interface would increase the class of
> problems that erlang can solve in the hands of beginning to
> intermediate developers. It would also push implementers in the
> direction of an OTP compliant application without sacrificing
> efficiency.
>
> In addition, there has been a call on the email list for a
> string_stream implementation so that a buffer of data (e.g., an
> SMTP
> message, HTTP request, HTML page, multi-record socket protocol
> packet,
> raw text database, comma-delimited file, etc.) could be treated
> as a
> stream of binary elements rather than a single block of data.
>
> Finally, testing systems often need a generative source of data
> that
> can be replayed or repeated in a precise manner to trigger a
> fault or
> test a patch to same. The circular binary stream allows infinite
> streams of generative data, and the behaviour stream allows a
> functionally generated stream of data to be emitted.
>
>
> Rationale
>
> There are a few common idioms that are used when efficiently
> handling a binary data source:
>
> 1) "Chunking" the data to smaller sub-binaries
> 2) Buffering the chunks for efficient I/O
> 3) Few of the standard idioms are OTP-compliant
>
> A gen_server implementation seemed the most straight-forward
> method for making an OTP-compliant method for chunking a serial
> stream. A behaviour was created so that streams could be computed
> and generated rather than requiring a pre-constructed file or
> binary as a source.
>
>
> Reference Implementation
>
> A working version is available at the DuoMark Website [2].
>
>
> References
>
> [1] Tim Bray's weblog
> http://www.tbray.org/ongoing/
>
> [2] http://www.duomark.com/erlang/proposals/gen_stream.html
>
> Copyright
>
> This document is released to the public domain.
>
>
>
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> End:
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
More information about the erlang-questions
mailing list