[erlang-questions] Question on ETS/DETS and log_mf_h

Fri Aug 17 13:13:30 CEST 2007

Hi Erlang users

I am liking my erlang programming very much this far. I like the simplicity
of the language
and how the OTP part allows me to construct concurrent programs. However, I
have run into
some dilemmas, which I think is due to me using the primitives in a
non-intended way. Hence,
let me get some views on the problems I have:

*** log_mf_h ***

I like log_mf_h together with the sasl applications rb module a lot: Let
your program run and let
it gather problem reports in a binary round-robin circular database. Then,
at a much later time,
use the rb module to look into the problem reports. The main part of
log_mf_h (OTP R11B-5) is:

Bin = term_to_binary(tag_event(Event)),
Size = size(Bin),
NewState =
        if
           % .... rotate logs if necessary
        end,
[Hi,Lo] = put_int16(Size),
file:write(NewState#state.cur_fd, [Hi, Lo, Bin]),
{ok, NewState#state{curB = NewState#state.curB + Size + 2}};

That is: We tag our problem report Event and convert it into a binary, take
its size and store the report in the file
as <<Size:16/integer, Bin/binary>>.

My problem: I have some processes which are start_link'ed with a lot of
state and when they run, they accumulate quite some state. Far more than the
64k we can store in the 16-bit size field. Thus, when rb tries to read the
terms from a PROGRESS or CRASH report things go very wrong indeed. The
PROGRESS report problem is mostly to the big
state-transfer on the start_link. Is it a bad idiom to shove that much data
to a starting process or should I seek to
transfer it with a message just after process start (And of course handling
the serialization issues neatly). The CRASH problem report is more
problematic since I'd like to have the full state for debugging.

I *could* alter log_mf_h to store 4 bytes of size. 4GB seems to be a
reasonable upper limit for quite some time. Also, I ponder if it wouldn't be
an advantage to trade CPU cycles for Disk I/O and make it use
term_to_binary(tag_event(Event), [compressed]), at least as an option.

I could also just throw log_mf_h away and then use standard logging to file
via SASL, but it really defeats the purpose of log_mf_h. Is there any tool I
am missing which can do the same as log_mf_h without the size problem?

What nags me is that the size has not been bumped a long time ago. This
makes me think I am using Erlang wrong
or I am using log_mf_h wrong. The process uses some nice data structures,
dict and gb_trees (The latter to get
a simple way to traverse the keys in order) which ensures a pretty fast O(lg
n) lookup time. This leads me to question #2:

*** ETS ***

For some other process exposing the same problem, I found it more beneficial
to shove its data into an ETS table,
but what is a tuple() described in the ets man-page type-wise? Of course it
is a tuple {e1, e2, e3, ..., eN}, but are
there any limits on the elements eK (0 <= K <= N) ? Can I just store
arbitrary Erlang terms like gb_trees and dicts
as the eK elements and can I do lookups on them? Any limitations on keys? It
would give a possible workaround where I can store most of the state inside
an ETS table. Though:

* I will probably serialize access to the ETS table. Goodbye parallelism on
that part of the code. I don't think it will matter, but it doesn't really
please me.
* I speculate that ETS tables are much faster than Erlang-based data
structures. Am I right?

Other options are also welcome.

Thanks In Advance.

  J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20070817/ca72c894/attachment.htm>