[erlang-questions] List comprehension puzzler

Antonio SJ Musumeci trapexit@REDACTED
Tue Sep 20 23:25:04 CEST 2016


"Choosing a smart algorithm that is intrinsically efficient is not
something that I consider to be an optimisation but rather good design."
-Joe

This can not be pointed out enough. A significant number of slow systems in
my experience are due to authors not properly understanding or recognizing
the costs, in memory and cpu, of the functions and data structures they
use. My example was less about speed and more about understanding that 2 *
O(n) on an unbounded list is dangerous and there are reasonably simple ways
to handle the situation that are arguably easier to read, clearly faster,
and have upperbounds on costs. That original function should hopefully
stick out as a possibly dangerous and costly design in almost an
instinctive way.

To take my paranoia regarding such things further... in a situation like
this, depending on the source of the data, it may be better to not use
lists. Maybe tuples. Or carry the length around separately if originally
stored in a way where it was already calculated. And it's not just for
speed but not throwing away information unnecessarily. Not making
assumptions about how the data will be used such that getting that
information back later could be costly. In this case it's very similar to
char* strings in C. Often the length is known at some point but almost
always thrown away leading to many functions needing to recalculate it.


On Tue, Sep 20, 2016 at 4:34 PM, Joe Armstrong <erlang@REDACTED> wrote:

> On Tue, Sep 20, 2016 at 9:55 PM, Jesper Louis Andersen
> <jesper.louis.andersen@REDACTED> wrote:
> >
> > On Tue, Sep 20, 2016 at 6:52 PM, Lloyd R. Prentice <
> lloyd@REDACTED>
> > wrote:
> >>
> >> Question: how can we time the proposed solutions to compare performance?
> >
> >
> > One solution is https://github.com/jlouis/eministat
> >
> > But one has to weigh the most efficient solution against two things:
> >
> > * Which solution is the most readable and elegant. It is more likely to
> be
> > correct.
> > * How many ISBN numbers per second are we looking at?
> >
> > Modern computers are unfairly quick at computation once data are on the
> CPU
> > itself. So unless you have a very large count of ISBN numbers to verify,
> I
> > would perhaps spend my time elsewhere in the code base. Your systems
> overall
> > efficiency is likely to suffer from other factors than a single ISBN
> > verification.
>
> Agreed - if you think about it, the ISBN number must have come from
> *somewhere* - if they have come from disk then at a guess
> most of the time will be spent reading and parsing the input. If they come
> from a database most of the time will be in format conversions.
>
> Once the ISBN is in RAM then any processing involved will be fast.
> Modern processors do Giga instructions/second so converting
> millions of ISBN/second should not be a problem - reading and *parsing*
> millions of numbers per second would be a problem)
>
> The 'old truth' was that most time in most applications was spent in I/O
> not in computation (apart from computational heavy problems)
>
> The problem with comparing different versions of the routine
> isbn_format_valid that you don't see the big picture. If isbn_format_valid
> only takes 1% if the total time then it doesn't matter if you
> optimize it.
>
> The old advice was write as clearly as possible, measure, then
> optimize the least efficient part *if necessary*.
>
> In my experience optimization is hardly ever needed - of all the code I've
> ever written only a tiny fraction ever really needed optimizing.
>
> By optimization I mean choosing complex code that is fast, rather than
> simple
> code that is easy to understand. Choosing a smart algorithm that is
> intrinsically efficient is not something that I consider to be an
> optimisation
> but rather good design.
>
> The advice - "write first - then measure" is difficult to follow since
> measurements are difficult to make and interpret. The only thing I
> know is
> that my intuition as to where the time really goes is almost always
> wrong (apart from in I/O which is always slow)
>
> /Joe
>
> >
> >
> > --
> > J.
> >
> > _______________________________________________
> > erlang-questions mailing list
> > erlang-questions@REDACTED
> > http://erlang.org/mailman/listinfo/erlang-questions
> >
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160920/b77f387d/attachment.htm>


More information about the erlang-questions mailing list