[erlang-questions] 1000+ users; 30+ data tables/user

Tue Nov 11 22:28:19 CET 2014

Thank you, Joe,

You've broadened my horizons and I love the simplification implied.

For sake of clarification and deeper understanding:

-- Would, then, one directory and n files per user work?

      One respondent to my question cited possible limits imposed by innodes and atoms. But I
      see that my work station provides 30138368 inodes  and Erlang docs tell me that Erlang
      provides 1048576 atoms, so I'm not convinced that these limits are a problem.

-- Would one directory and n dets tables per user work?

-- Can sufficient availability be achieved with conventional backup strategies?

Yes, I know the answers are application specific, but are there metrics/rules of thumb/guidelines that can give me back-of-the-envelope answers short of all-out test-and-measurement?

Many, many thanks for your insights.

Lloyd

-----Original Message-----
From: "Joe Armstrong" <erlang@REDACTED>
Sent: Tuesday, November 11, 2014 3:02pm
To: "Garrett Smith" <g@REDACTED>
Cc: "Lloyd R. Prentice" <lloyd@REDACTED>, "erlang-questions" <erlang-questions@REDACTED>
Subject: Re: [erlang-questions] 1000+ users; 30+ data tables/user

On Tue, Nov 11, 2014 at 7:41 PM, Garrett Smith <g@REDACTED> wrote:
> Hi Lloyd,
>
> Sorry for the late reply here - I was interested in this thread last
> week when you sent it but was in sleep deprived conference mode and
> never got back to it.
>
> On Tue, Nov 4, 2014 at 8:49 PM, Lloyd R. Prentice <lloyd@REDACTED> wrote:
>> Hello,
>>
>> This is a naive question reflecting my inexperience with databases.
>>
>> I'm planning to offer my users a set of management/planning tools. Each user would be storing/retrieving user-specific data involving as many as 30 data tables.
>>
>> --- Data fits well into Erlang records.
>> --- We're not talking huge volumes of data per user.
>> --- Nor do I expect much data analysis.
>> --- Data integrity and availability are essential.
>> --- Users may, however, wish to bundle up their data a some point and migrate to a different system.
>>
>> I'm attracted to mnesia because of it's it's tight integration with Erlang and it's replication features. I'm also considering riak.
>>
>> My first thought was that every user would own his/her own database. But this seems to rule out
>> mnesia since:
>>
>> "Mnesia is a truly distributed DBMS and the schema is a system table that is replicated on all nodes in a Mnesia system. The function will fail if a schema is already present on any of the nodes in NodeList."
>> http://www.erlang.org/doc/apps/mnesia/Mnesia_chap3.html
>>
>> An option would be to store data for all users in each of the 30 tables. But is there a better solution altogether?
>>
>> I'd much appreciate suggestions and guidance from wiser heads.
>
> I think we've all had this problem - a new project and nothing to hold
> us back but our own imagination. So the question... which database to
> pick. Which indeed? There are like 100 amazing options!
>
> My suggestion here is to stop this line of thinking immediately :)
>
> I would instead plan to throw your early work away. Pick something
> that is the fastest and easiest imaginable for you to make progress on
> your app. Treat it as a "this will almost certainly not be what I end
> up with".
>
> _For me_ this means one of these:
>
> - Hard coded values or config files
> - Dets
> - SQLite
> - MySQL
>
> The point is to keep it as simple as possible and just get stuff working.

Excellent advice.

For many systems - I use one file per user. The file contains term_to_binary(X)
where X is whatever I feel like to represent the use data.

(or you can use text files - then you can run amazing things like grep and find
on them :-)

The OS caches file access and I can easily analyse/dump the files.

I've *never* got to the point where I need to change the file system for
a database (but then again I've not built a really big system - and
this works fine
for several thousand files/users)

If and when the design problems are solved you can change representations
*if it is necessary* - choosing a database right at the start is
"premature optimisation" - if you ever get to this point then the
choice of representation
should be dictated by measurement and not guesswork.

/Joe

> When I'm starting on something new, I just don't know enough about
> anything to make the right decision - so I deliberately make the right
> _wrong_ decision - that is, the decision that will let me move forward
> quickly and get to the real problems. I might throw it away later, or
> I might keep it. But in any case, I'm sure as hell not going to spend
> a lot of time on it. Not until I'm facing real, hard, visible problems
> that I can use to inform my next steps.
>
> Garrett
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20141111/1d48ac65/attachment.htm>