Telecom & Mnesia - Call Detail Record Collection

Thu Apr 11 10:55:54 CEST 2002

On Wed, 10 Apr 2002, Martin J. Logan wrote:

>Hello All,

>    I am working for a small telephony company here in the
>States. The only method that we have for billing our customers
>is via call detail records. The system that we are using now is
>a rather poor one. My boss and I propose that we can
>revolutionize the way that we do this by employing erlang and
>mnesia.

Erlang -- great; I'm not sure about using mnesia for this,
though.

>The requirements for this project could be in excess of 8 gig of
>data storage with 500,000 records comming in on a daily basis. I
>have always thought of mnesia as a real time db for storing
>smaller amounts of data. I would like to build this cdr
>collection and reporting system entirly in erlang though and
>remove the need for the sql db if possible. The more erlang the
>better off we all are.

Question: is this a CDR collection or generation system, or both?
In a CDR collection system (which presumably also does analysis
and produces bills), you certainly want a powerful database
system of sorts. You'll probably end up doing correlation, rules
lookup, taking into account discount rules, coupon days, etc.

In a CDR generation system, you usually want to transfer CDRs to
the collection system as efficiently as possible, either in
real-time (a la RADIUS accounting), or in batch (normal for
telecom systems). For this, I'd use_disk log, and perhaps ftp.
Basically, disk_log will efficiently throw the CDRs onto disk,
and you'd probably want to configure it as a stop log, since you
don't want to automatically overwrite old CDRs.

Using Erlang is probably still a good idea, since it offers
advantages both in generation and collection:

- if you need to scale by adding processors, Erlang will handle
  much of that in a nearly transparent fashion.
- if you want to write sophisticated filtering functions for
  correlation of CDRs, Erlang's declarative style will probably
  come in handy.

>The quiestions I have are:
>
>1. Has anyone stored such a vast amount of data in mnesia? I
>could not find much on the list.

Chandru and Per B answered. I've been storing a few million
records in disc_copies (which use disk_log for disk storage, but
also keep the table in RAM). One needs _lots_ of RAM for this
type of setup (I have 1 GB, and that's really not sufficient.)

>2. How does mnesia preform under large loads?  I need to do
>reporting on the data.

It depends on how you store data and what your access patterns
are. For reporting, I guess you do mostly reads. If you're
reading/traversing tables that are loaded into RAM, mnesia
shines. On disk-based accesses, most decent DBMSs will probably
run circles around it. Mnesia is still mainly a RAM-based DBMS,
with facilities for disk storage.

If you're continuously running transactions with lots of writes
on disk-based tables, you may want to use the transaction type
'sync_transaction' (mnesia:activity(sync_transaction, Fun)).
Otherwise, mnesia will log your writes to a transaction log, and
periodically "dump" the log information into the actual tables.
This will make the transaction appear cheaper than they really
are, and you may eventually overload the system.

>3. What are the most effective mnesia configurations for maxing
>out its data storage potential? What is its potential?

I think this is a moving target. If you use fragmented tables,
the potential for storing large amounts of data in mnesia is
huge, but it probably doesn't suit all applications. I recall
that Håkan Mattsson verified linear transaction processing
scalability of mnesia in a 50 node system. If your application
allows the database to be partitioned, and accesses can be
localized to single fragments, it's mostly a question about how
much hardware you can throw at the solution (well, with a hard
BEAM limit of 256 nodes).

>3. Has anyone out there used mnesia/erts/otp for call detail
>record collection? If so can you give me any recomendations for
>my project?

We use Erlang for CDR collection and transfer to a (non-Erlang)
billing system. I tried to find a public document that describes
how we do it, but have yet been unsuccessful. At least, we
support 90,000 voice calls per processor, and are able to turn on
charging on each call. We're also able to output CDRs in a robust
manner. I believe that we've dimensioned the intermediate storage
of CDRs for 720,000 records, which should be enough for a while,
given that we periodically output records to one of two redundant
Billing Systems.

The Billing system is not Erlang-based, and is Somebody Else's
Problem (tm). I can't give you any hints there. (:

/Uffe