[erlang-questions] if vs. case (long)

Thu Mar 13 06:27:51 CET 2008

There were a few points in the if vs. case discussion and the binding  
of multiple variables in the case statements that I wanted to make,  
but couldn't find all the relevant bits so I started a new thread.

I personally almost never use 'if'.  I think it was an unfortunate  
name choice because it is so ingrained in any imperative programmer's  
toolset, but it has a different meaning in erlang.  I would not miss  
it if it were removed, however, I do see Sean's points about other  
people's style.

I think someone else showed that the alternatives in a function  
definition can be used in place of both 'if' and 'case' so neither is  
really necessary, but I think case does make code more concise if it  
is not abused with too much complexity.

My default instinct in erlang is to use case because I know it can  
always be easily expanded to include more branches, and it guarantees  
that all legal cases are enumerated so nothing is hidden in the order  
of branches:

Var = computeStuff(),
case DebugOn of
    true -> log(Var);
    false -> ok
end,
continue(Var).

While this may seem like extra typing, it is clear that there are  
only two legal cases.  In general, I don't often find myself in this  
situation.  It has something to do with thinking in imperative terms,  
versus thinking in functional terms.  In general, I would probably  
end up with the following log function:

%% Standard clauses for logging.
log(What, true) -> log(What);
log(_What, false) -> ok.   %% or use _NotTrue instead of false if  
other values are ignored.

And then would pass around a debug logging flag.  Even though it  
looks inefficient, the compiler should do a good job of optimizing  
this, it allows tracing as others have said, and gives the option of  
adding special cases of What that apply to all code locations:

%% Placed as first clause of function, catches special cases...
log({special_case, Value} = What, true) ->
     report_special_case(Value),
     log(What);

The added advantage is that the inline code has no branches because  
the case is eliminated, so it is easier to read in a functional  
manner. I tend to prefer to call out to a function which may have  
several alternatives, but always returns a common value (or no value)  
which hides the complexity of what needs to be done.

Var = computeStuff(),
log(Var, DebugOn),
continue(Var).

------------------------------------------------

Case is my default when differentiating on a single value.  I resort  
to if when there are different reasons for each branch, which depend  
on different bindings:

if
    StopLight =:= red -> stop();
    StopLight =:= yellow and CopPresent =:= true -> stop();
    PedestrianInCrosswalk =:= true -> stop();
    true -> go()
end

Without an if statement, I would turn this into a function:

drive() ->
   go_or_stop(StopLight, CopPresent, PedestrianInCrosswalk).

go_or_stop(red, _, _) -> stop();
go_or_stop(yellow, true, _) -> stop();
go_or_stop(_, _, true) -> stop();
go_or_stop(_, _, _) -> go().

The confusion on which argument represents which value can be  
documented in one of two ways:

1) Use labels  =>  (yellow = _LightColor, true = _CopPresent, _ =  
_PedestrianInCrosswalk)
2) Pass the args as a record  =>  #intersection_state{color = yellow,  
cop = true, ped = false}

------------------------------------------------

There was another discussion about adding variables to case  
statements.  The starting clauses were:

{ValA, ValB} = case Var of
     2 -> {5, computeB(2)};
     4 -> {computeA(4), 5};
     _Other -> {5,5}
end,
...

Over time, more cases and variables are added to get something like  
the following:

{ValA, ValB, ValC, ValD, ValE} = case Var of
     ...  12 different branches ...
end,
...

I would argue that the code has long ago evolved past the original  
construct, and should have been refactored.  How it is refactored  
depends on how related the variable bindings are.

If the variables are being set independently of one another, or some  
of them are being set based on bindings other than the discriminator  
(in this case, Var), they should be split from the rest.  One  
alternative is to use a separate function for those that are  
independent of Var:

ValA = computeA(AltVar),
ValB = computeB(AltVar, AnotherVar),
{ValC, ValD} = case Var of
    ...
end,
ValE = computeE(Var, AltVar),
...

The separate functions for computing can each have a different number  
of branches.

If the 5 variable bindings are related to each other, use  
encapsulation techniques to show their relationship just as you would  
in an OO language: create a record type that holds the values.  Then  
call a separate function that takes all the parameters needed to  
construct the bound record instance:

Obj = make_new_object(AltVar, AnotherVar, Var),
...

make_new_object(AltVar, AnotherVar, Var) ->
    {ValC, ValD} = compute_dependent_vars(Var),
    ValA = computeA(AltVar),
    ValB = computeB(AltVar, AnotherVar),
    ValE = computeE(Var, AltVar),
    #object{a=ValA, b=ValB, c=ValC, d=ValD, e=ValE}.

Related to this someone mentioned that things get complicated when a  
throw is used inside one of the branches of the complex case  
statement.  I tend to avoid throw and rely on crashes instead.  It is  
best not to test for cases that have to be handled in a special way  
if there is an easier mechanism to get around them.  If you really  
have to throw, I think you will find the helper function approach  
(either of the last two cases above) to more easily accommodate a  
traceable throw coming from one of the support functions.

----------------------------------------------------

I think Mats suggestion to put a Good vs. Bad coding style manual in  
the docs would help alleviate a lot of complaints that are not  
language issues, but poor usage approaches.

jay