[erlang-questions] Theoretically Stuck
Jay Nelson
jay@REDACTED
Wed Sep 6 05:46:33 CEST 2006
Rudolph van Graan had questions about modeling data and erlang...
I was a bit confused when I read your post because you have mixed
several concepts together. The post title indicated to me you were
interested in expressing a problem most closely to its description,
while the body of the message indicated your were concerned with
performance.
> Restatement of some of the original message:
The original problem posed involved database records which evolved over
time.
-record(person, {name, surname}).
What happens when new fields are added to some records?
> End restatement
I would pose a few questions for you to consider so that you can decide
what is your primary goal:
1) Suppose you had no programming language, just an SQL database.
a) How would you model the original person table?
b) What would you do when a new column is needed?
c) Would you really create a separate table every time you added fields?
2) If you have three people who all have name, surname, but they each
have different other attributes, will you use them interchangeably in
some method invocations if you were using an OO language?
3) If you had 1M people, each of which had different attributes, would
you expect to be able to use them interchangeably?
4) Would you expect performance to be the same in all three cases?
There are several concepts intertwined:
1) How do I store data in a database that I know will migrate to a new
schema?
2) What do I do when the schema changes?
3) How do I deal with statically typed languages when my data is dynamic?
4) Inheritance seems like an efficient way to avoid repeating code (DRY
principle). How can I use it in erlang?
5) What whizzy language feature of erlang makes these problems go away?
What are you more worried about?
A) The problem is accurately modeled.
B) Code clarity, lack of repetition and ease of modification.
C) Performance and size of application.
D) Ability to morph the data structures on a per instance basis.
E) Migration of the architecture over the course of years.
F) Database management and performance.
G) Language specific constructs that make the code minimal and beautiful.
Not all of these are orthogonal, but you need to constrain the problem.
Designing and coding consists of a series of tradeoffs. You can't get
all of A-E with a single best design or coding style.
Your post title said to me, "Forget performance, what is the _correct_
way philosophically to overcome my problem" (read A is most important,
although B could apply just as well). If the main problem is that the
data model changes regularly and the instances can all be differently
shaped (as in D), then:
- Use proplists or dictionaries. You may want to store your objects in
erlang terms file using dets or consult. Avoid using a structured
database like SQL.
If the data changes less frequently, but the attribute set changes when
they do and you want efficient database management (as in F):
- Use an OO database with schema and object versions. I know of no
erlang adaptors, so pick a different language.
If your case is E, don't worry so much about rigid data structures:
- Use records "properly" (i.e., normalize the SQL tables and have
corresponding records for each table), migrate all data in your schema.
You could do this online or offline depending on your requirments (or
node by node even). Just write the extra code every time it changes,
code is less important than a clean database structure.
For option C, speed of access, and reusable functions are more
important. Code using separate modules for the additional features.
Create your original records using a grouping record:
-record(person, {std_attrs, moda_attrs, modb_attrs}).
Then in each of std, module a and module b you can define a record that
is consistent for the functions coded in the module. To extend, add a
new module and a new field on the person record.
-module(std).
-record(std_attrs, {name, surname}).
-record(moda_attrs, {type}).
-record(modb_attrs, {stuff}).
This can be mapped to an SQL database structure that changes
periodically as in case E. Case F is probably best covered in this way
if you don't want to use records/SQL "properly". Here you aren't stuck
using slow proplists or hashtables, but can tune and structure the data
for efficient functional access inside each module.
ROK's dictionaries cover G. Right now they are not available. If you
try one of the above techniques, maybe you will learn enough about the
problem to be able to contribute to the effort to implement them, or at
least will have specific examples of code savings and clarity that might
help convince the OTP team to implement them.
There are probably lots of other options available (e.g., implement
ROK's dictionaries using the equivalent existing functions he
describes). The limiting factor for implementation will always be the
choice of data structure.
I don't think your example of using OO inheritance is a good approach.
You will eventually have a mess of the type system and will have to
restructure everything once it is already spaghetti. If objects can be
used interchangeably, they can be subclasses, but if you add or subtract
methods you will have a heck of a time dealing with a collection of
randomly related person objects, some of which have dispatch methods and
some of which don't.
(The approach you describe is what I would call implementing dynamic
data by hacking a static typing system. A more philosophically correct
way would be to implement schema versioning and a declarative attribute
set using a hash and then encapsulate the version and the instance in a
single object -- which could be done in erlang with ets or proplist and
a record defining the version and the instance attributes at the expense
of pattern matching on fields and values.)
I would take these steps in architecting a new system:
1) Identify the key characteristics ranked in importance (as in a subset
of A-G above)
2) Determine what features you really need from a database (can you use
flat files, dets, or consult type term files?).
3) Don't worry about performance or efficiency until you have the code
working.
4) Ok, if you know you need 1M objects in memory at once, you can think
about performance, but really don't sweat it yet.
5) Measure and tweak performance
Generally you want to start with goals of Clarity, Succinctness and then
Performance. Maintainability will come along if you can achieve those.
jay
More information about the erlang-questions
mailing list