[erlang-questions] Update with complicated data structure
Richard A. O'Keefe
Wed Nov 25 05:13:23 CET 2015
On 24/11/2015, at 7:51 pm, YuanZhiqian <on-your-mark@REDACTED> wrote:
> Hi Richard,
> Thanks for your elaborative comments. Some of the advices are helpful and I appreciate a lot for that; but some of them are not quite on the track ...
Well, if I'm not _shown_ the track it's hard to stay on it.
> For example, regarding the array size issue, you see, it's just a sample where my point is focused on the assignment expression, so I use an array instead of a hash table which I actually have used in my project, for that would make the code messed up with less important parts. I am certainly not a C language tard = =
"Tard"? What does that mean?
> And please allow me to clarify my purpose of this question, this is not an erlang v.s. c&c++ debate,
That was understood; no clarification needed.
> I am just facing an actual problem in real scenario, and need some practical advice.
While it's not a C vs Erlang debate, it *is* a matter
of trying to do things the C way in an Erlang program.
> So it is not that I insist in making erlang behave the same as imperial language, but that it is a real requirement in the project that I'm trying to solve.
I doubt that very much. Requirements talk about what a program
does, as seen from outside. How the collection of companies is
held, whether as a list, as a gb tree, as a map, as an ETS
table, or even in the process dictionary (for safety, that
had better be the process dictionary of a separate process)
should be entirely up to you. I have serious trouble
believing in requirements that say
"You must use a data structure that makes search
and update take a long time."
Your requirements may well say
"Update the consumption of company X by D"
but of course you haven't shown us the requirements.
One thing may have struck you already.
If you are using a relational data base,
changing one field of one record at a time is
a good way to run quite slowly. One way to get
good performance out of RDBs is *batching*.
Now if you kept the list of N companies *sorted* on increasing
company_id, and if you had a *batch* of M companies to be
updated, also in a list sorted on increasing company_id,
then you can *merge* those two lists and produce a new list
in O(N+M) time. (This is what old-timers used to do with
tape drives before even CODASYL data bases.) But if you do
it one company at a time, it's going to take O(N*M) time.
If you use a gb_tree, O(lg(N)*M).
So there is a really serious question about whether you really
*HAVE* to update one field of one record, or whether you have
a *batch* of fields, and possibly multiple updates.
> As for the naming issue ... ok you are right, but does it really matter?
Yes, it does.
> At least not as important in my case ...
> And you are asking me why I am using size_t instead of int, I don't know if I am right, but size_t seems an intended type for array indices,
(a) It cannot possibly be. C existed long before size_t did.
(b) The name *tells* you what it's meant for: it is meant for
the *sizes* of objects.
(c) Kernighan & Ritchie, "The C programming language", 2nd edition
consistently uses int for array indexing. But what do they
know? They only invented the language.
> since it is unassigned,
I think you mean 'unsigned', and that's actually bad.
> and codes from google use size_t as well.
Google invented Go. They did not invent C and are
not authorities on how it was meant to be used.
> Try to compile using int, but configure your error level to the most strict mode, turn on "warning as error" option, then you won't get the code compiled.
I have no idea what you are talking about.
Code that indexes arrays using int gets through gcc, clang,
lint, and splint at the highest levels with no warnings.
Good C compilers warn about using *char* as an array index,
because the programmer has no way of telling beforehand
whether char is signed or unsigned, so
a[(char)255] might or might not involve a negative subscript.
But invalidating int would mean breaking almost all the C
code ever written for Unix.
I just checked the source code for (original) vi, and yep,
it uses int subscripts, including "tp[-1]", which is not
actually an unusual thing to do in C (tp being a pointer
after the beginning of an array), and no usable C compiler is
going to treat that as an error.
> So the point is, thanks again for your help, but unfortunately I am getting more confused with too much irrelevant comments.
How about telling us what the actual requirements really
How about explaining why you are using a list?
I advised you to use a gb_tree (or something of that sort).
Someone else suggested a new-fangled 'map'.
Both are good suggestions. What's wrong with them?
How many companies are you carrying information about?
More information about the erlang-questions