[erlang-questions] Must and May convention

Thu Sep 28 10:01:12 CEST 2017

> I really wish Dialyzer accepted (and checked) explicit declarations of
purity.

i could not agree more
that would be useful feature and amazing time saver!

I am currently working on a toll that creates DB of function properties,
and motivation was exactly finding non pure functions in any given
project.

On Thu, Sep 28, 2017 at 3:50 AM, zxq9 <zxq9@REDACTED> wrote:

> On 2017年09月27日 水曜日 12:46:19 Loïc Hoguin wrote:
> > On 09/27/2017 11:08 AM, Joe Armstrong wrote:
> > > For several years I've been using a convention in my hobby
> > > projects. It's what I call the must-may convention.
> > >
> > > I'm wondering if it should be widely used.
> > >
> > > What is it?
> > >
> > > There are two commonly used conventions for handling bad arguments to
> > > a function. We can return {ok, Val} or {error, Reason} or we can
> > > return a value if the arguments are correct, and raise an exception
> > > otherwise.
> > >
> > > The problem is that when I read code and see a function like
> > > 'foo:bar(a,12)' I have no idea if it obeys one of these conventions or
> > > does something completely different. I have to read the code to find
> > > out.
> > >
> > > My convention is to prefix the function name with 'must_' or 'may_'
> >
> > I've been debating this in my head for a long time. I came to the
> > conclusion that 99% of the time I do not want to handle errors.
> > Therefore 99% of the functions should not return an error.
>
> Taking this observation a step further...
>
> I've got a guideline that has never made it into English yet (along with
> some coding guidelines and a few other things I should re-write...) that
> states that programs must always be refactored iteratively to aggregate
> side effects where possible and leave as much code functionally pure as can
> me managed.
>
> The Rules:
> - Pure functions are always crashers.
> - Side-effecty functions retun the type `{ok, Value} | {error, Reason}`
> - A side effect is anything that touches a resource external to the
> current function.
>
> Some programs are full of side effects -- doing lots of network and file
> system I/O while running a GUI. Others are not so side-effecty. The case
> where you REALLY get viral side-effect proliferation is use of ETS tables
> (shared binaries is actually another case, but not included in the rule
> because the abstraction generally holds well enough). But even in these
> cases we can usually break the pure bits out somewhat cleanly, at least
> once we understand what the program really needs to do.
>
> That bit right there, "understand what the program really needs to do", is
> the truly hard part of getting any of this right. Or anything right.
>
> When a project starts from scratch you don't understand the details yet,
> otherwise typing speed would equate to development time and that's just
> never the case. So we start out with a very high proportion of {ok, V} |
> {error, R} type functions initially because we don't know anything about
> anything and side effects wind up getting scattered about because we just
> didn't have a very clear view of what was going on. When inheriting messy,
> legacy code you understand even LESS because you don't understand what the
> program should do and you also don't understand whatever it is currently
> doing until you diddle with it a bit.
>
> And that's totally OK.
>
> But only at first.
>
> That's just to get us over the hump so that something works, the task is
> handled, and if a bus hit That One Guy tomorrow then we could continue
> along and at least have something running.
>
> To avert a lifetime of nightmares, lost hair and broken marriages due to a
> death-march-style maintenance cycle, though, we pre-emptively attack the
> program again with a refactoring effort aimed specifically at unifying
> types and side-effect hygiene. It is common that you'll have two flavors of
> basically the same thing in different areas, especially if you've got more
> than two people working on the project. That's got to get fixed. It is also
> common, as noted above, that side effects are scattered about for various
> reasons.
>
> Once we've shaken the easy bits out we sometimes add a list of pure
> functions to the top of each module as a module attribute:
>
> -pure([foo/1, bar/2, baz/0]).
>
> Those should not only be pure, provable, and excellent targets for the
> initial property testing effort to get warmed up on, but are also known to
> crash when they get bad inputs. And of course everything should, by this
> point, Dialyze cleanly. Also, it isn't impossible to write a tool that
> keeps track of impure calls and checks that the pure functions only ever
> make calls to other pure functions (vast swathes of the stdlib define
> abstract data types, and nearly all of these calls are pure).
>
> What are the impure functions?
>
> The service loop is always impure -- it deals with the mailbox. Socket
> handling functions (which may be the service loop as well). Anything that
> writes to a file. Anything that sends a message. Interface functions that
> wrap message passing. Anything that talks to wx or console I/O. Etc.
>
> The outcome is that side effects traditionally get collected in:
> - Interface functions
> - Service loops (and mini-service loops / service states)
> - File I/O wrapper
> - Socket I/O wrapper
> - User interfaec I/O wrapper
>
> The last three are "wrappers" because by writing a wrapper we give
> ourselves a place to aggregate side effecty calls instead of making them in
> deeper code (out at the edges of the call graph). A message may come over a
> socket or into the service loop that requires some processing and then
> writing to a file, for example, but this doesn't mean that we write to the
> file out there at the bottom of the call graph. Instead, we call to get the
> processing done, then return the value back into the service loop (or
> somewhere close to it, like a message interpreter function), and then the
> next line will be a call to either write the file or a call to a file
> writer that is known to be side-effecty.
>
> Just about everything else can be pure. Most of the time. (Of course,
> "processing a value" may involve network communication, or RPC, or asking
> some other stateful process to do the processing, and any of these can
> prevent a function from being pure. But it is rare that these are the
> majority of functions in a program.) That means almost everything can be
> written as a crashable function -- because the ones that return {ok, V} |
> {error, R} should have already known what they were dealing with before
> they called the pure functions.
>
> One side effect of this overall process is that, at least in writing
> customer facing software, we discover errors straight away and fix them.
> Most of the bugs are the really simple kind:
>
> "If I enter a number for a name, the window disappears and reappears with
> empty fields."
> (The windows process crashed and restarted back where it was.)
>
> or, more often
>
> "If I enter a number as a name the name disappears after I click the
> 'submit' button."
> (Something deeper in the system crashed and the final update to the GUI
> was never sent.)
>
> We IMMEDIATELY know that we didn't type check there properly and some
> other part of the code died with the "bad" data once it was noticed -- and
> the user just saw a momentary hiccup and fixed whatever was wrong on their
> own. So this wasn't the end of the world or a big scary red X showing up on
> the screen with mysterious numbers and inscribed error messages or
> whatever. But it WAS bad and unexpected behavior for the most important
> person in the program's universe. A quick check of the crash log bears out
> what we thought, and that problem is from then on handled properly and
> never heard from again.
>
> When this sort of problem becomes really confusing to debug is the cases
> where we've gotten too fancy with exception handling and played loose with
> types. That input value may have traveled quite far into the system before
> something died, and figuring it out is a bit more tricky then without a
> dead-obvious crash message letting you know about it.
>
> Blah blah blah...
>
> We are all looking at roughly the same things here. Joe likes to prefix
> function names. That's probably a good system, but it doesn't work well for
> people who use autocompletion (people still do that?). Is that a tooling
> conflict? Aren't Joe's function names THEMSELVES a sort of tool? How about
> the -pure declaration? That's great -- but what we really want, actually,
> is a way to declare a function pure so that Dialyzer could know about it,
> as part of the -spec for a function. That would be awesome. What happens
> for us is that functions near the top of a module tend to be side-effecty
> and functions at the bottom tend to be pure -- so we just sort of know what
> terrain we are navigating because we know the layout that results as an
> outcome of following our little side-effect focused refactoring. Also, in
> documentation we know the difference immediatly because of our own return
> typing convention: anything that returns naked values is a crasher, period.
>
> It looks like none of the approaches is particularly perfect, though. I
> really wish Dialyzer accepted (and checked) explicit declarations of
> purity. I don't know what syntax would be good for this, but its something
> I would like to have. Also -- it would allow for people to maybe use their
> pure functions in guards, which is a frequent request I hear come up quite
> a bit.
>
> -Craig
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20170928/119bb42a/attachment.htm>