[erlang-questions] Nested Case Statements v.s. multiple functions

Mon Sep 25 17:11:00 CEST 2017

On 2017年09月25日 月曜日 09:25:48 code wiget wrote:
> Hello everyone,
> 
> As I get further into Erlang, I am starting to realize that some of my functions have been getting pretty ugly with nested case statements. For example, I had a nested case statement that looked something like this:
> 
> Send_update(Arg1) ->
> 	case do this(Arg1) of
> 		{ok, [Val1, Val2]} ->  
> 			case do_that(Val1, Val2) of
> 				{ok, [Val3, Val4]} ->
> 					case do_this2(…) of 
> ….
> 
> It continued into this for another few functions, you get the picture - its ugly, and it is hard to read. 

This happens a lot when you're first starting out. Don't worry, it gets better.

And you're headed in the right direction, asking the right questions, clearly sensitive to the right smells.

> So I went and I converted it to a top level function that would then call lower level functions like so:
> 			
> 
> 
>  send_update(Arg1) ->  
>      case ... of
>          {ok, [Val1, Val2]} ->  
>              send_update(check_indices, {Arg1, Val1, Val2});
>          Else ->  
>              lager:error("ERROR: ..")
>      end.
>  send_update(check_indices, {Arg1, Arg2, Arg3}) ->
>      case check_indices(Arg2, Arg3)of 
>          true ->
>              send_update(get_values, {Arg1, Arg3});
>          false ->
>              lager:error("EMERGENCY: ….")
>      end;
>  send_update(get_values, {Arg1, Arg2}) ->
>    ...
>      case ... of  
>          {ok, [Val1, Val2, VAl3]} ->
>              send_update(send_value, {Arg1, Val1, Val2, Val3});
>          Error ->  
>              lager:error("ERROR: …")
>      end;
>  send_update(send_value, {Arg1, Arg2, Arg3, Arg4}) ->
>     …
>      Do_something(Args),
>      ok. 
> 
> 
> Now that I look at it though, both don’t look right. They don’t look like something I would write in any other language where I would just have if’s and else’s.
> 
> Is this the proper way to write Erlang? I know everyone has their own style, but I assume there is some accepted form of writing functional programs with deep nests.

This is the universe sending you a message:
1- Some of your messages are supposed indicate dispatch to specific functions.
2- VERY often having a bunch of nested cases means that you have awkward data structures and need to rethink what your program is actually doing.
3- Occasionally you will have a series of checks you need to perform in a specific order -- and there we can use a special form of a fold somtimes called a "pipeline" to normalize this.

Case 1:

Cutting your example down:

send_update(Arg1) ->
  ... .

send_update(check_indices, {Arg1, Arg2, Arg3}) ->
  ... ;
send_update(get_values, {Arg1, Arg2}) ->
  ... ;
send_update(send_value, {Arg1, Arg2, Arg3, Arg4}) ->
  ... .

Why are these all called `send_update/1,2`? They aren't SENDING any UPDATES as far as I can tell. They are DISPATCHING based on the shape of the received input, and that's a totally different thing. In fact, this particular example is recursing over itself in a pretty weird way that confuses the call graph in my mind. Imagine if it had, say, 20 more clauses!

It is quite common to find yourself in need of something analogous to gen_server:handle_cast/2 or handle_call/3.
Quite a lot of the time when you need to reinvent this on your own, however, you are dealing with either unsanitized data, or data with lots of variable arguments. If you have, say, raw messages coming over the wire to a pure Erlang process (a process hand-written using proc_lib instead of being a gen_server or whatever) you'll often have something like:

handle(Incoming, State = #s{buffer = Buffer}) ->
    case interpret(Incoming) of
        {ok, Message, Rest} ->
            dispatch(Message, State#s{buffer = add_to(Buffer, Rest)});
        {error, Error} ->
            ok = log(warning, "Got bad things: ~tp", [Error]),
            terminate()
    end.

dispatch(Message, State = #s{socket = Socket}) ->
    Result =
        case Message of
            {check_indices, {A1, A2, A3}} ->
                check_indices(A1, A2, A3);
            {get_values, {A1, A2}} ->
                get_values(A1, A2);
            {send_value, Values} ->
                send_value(Values, Socket);
            Unknown ->
                ok = log(error, "Probably should avoid this clause because we should die here."),
                terminate()
        end,
    NewState = some_state_update_or_whatever(State),
    {Result, NewState}.

You might need the above procedure to be larger or smaller, have a return send back through the socket inserted somewhere, enter a discreet receive loop as a reaction (which defines a special state of the process as a state machine), have another function do more pre-processing or more post-processing, or whatever. That's not the point. The point is that dispatching and recursively calling different clauses of the same function that have SEMANTICALLY DIFFERENT MEANING is just asking for code that is really hard to read.

Now consider the origin of that convolution -- you were putting way too much in a single clause of a function, and then you resorted to putting way too much into several clauses of the same function, which is a different form of the same confusion.

Break things down, dispatch to specific handlers, pull the result to be returned back to the dispatcher, handle the result in a common way. This isn't always possible, but it nearly always is, which is the reason it is possible to write event loops at all.

Case 2:

When you've never really thought much about what your program is doing it is pretty easy to get rather cavalier with data and wind up with a lot of nested data that is hard to get at, and then you suddenly find that you want to get at the inner data only some of the time and you can shortcut excessive checks with `case` and start turning your nested data problem into a nested `case` problem as well.

The biggest culprit there that I've seen are nested maps and nested records -- and nested maps of nested records and vice-versa.

Whenever you find yourself nesting a lot of data and the data structures themselves start feeling arbitrary and awkward to manipulate -- JUST STOP WHAT YOU'RE DOING. Before you write another universally shared record inside some .hrl file STOP.

Once this point has been reached the universe is telling you that you really need to be writing a module that is an abstract data structure. Write a module that contains an inner state structure of some kind, and (basically) setter/getter/updater functions for it. That's how quite a bit of the stdlib is written, and it is a major headache saver.

Make the exported data type be opaque to external callers so that Dialyzer can catch you when you start getting sneaky and trying to be too intimate with the hidden structure. The reason for this is simple to understand: you didn't understand your data initially so it is likely you still don't quite get it, so you're going to eventually modify the shape of things later. If you are too intimate with the data internals instead of hiding them behind functions then you'll wind up blowing up a lot of code (as you proabably already know). This will leave you with a choice: in the interest of time do you just write another clause next to the existing one, add another `case` layer somewhere, or actually rewrite everything to be "correct"? You'll almost never get the time or motivation to do it write later -- so another level of cases or clause cancer it will usually be.

Ugh. Just evil.

Case 3:

If you have a need to check whether some input is good, the permissions of a user, the context of the request, and the status of some external element before proceeding with an action, how could we do this without 4 levels of `case`? Especially if we want to return a properly detailed error message?

We could trickle-down through a series of functions that essentially curry out the stuff we've dealth with:

handle_request(Request, User, Context, ExternalThingy) ->
    case interpret(Request) of
        {ok, Action} -> handle_request2(Action, User, Context, ExternalThingy);
        Error        -> Error
    end.

handle_request2(Action, User, Context, ExternalThingy) ->
    case check_permission(User, Action) of
        ok    -> handle_request3(Action, User, Context, ExternalThingy);
        Error -> Error
    end.

handle_request3(Action, User, Context, ExternalThingy) ->
    case dispatch_target(Action, Context) of
        {ok, Target} -> handle_request4(Target, User, Context, ExternalThingy);
        Error        -> Error
    end.

handle_request(Target, User, Context, ExternalThingy) ->
    case check_condition(ExternalThingy) of
        ok    -> dispatch(Target, User, Context);
        Error -> Error
    end.

% ...and so on.

Now that's not really so bad, and in some cases it is really the right sort of solution because you might have a lot of variance in what is going on in each step. The case above, however, is so boringly vanilla that WE DON'T EVEN HAVE SPECIAL FUNCTION NAMES. That's a sign of something dumb going on. It is pretty rare that you should ever be writing functions named the same thing plus some incrementing number at the end, or find yourself misunderstanding single assignment to such a degree that you write a long line of updating variables that look like State1, State2, State3, State4. That sort of thing is one more hint to yourself that you're both not understanding the problem and probably trying to write, say, Python in Erlang.

But consider how little variance we had above. Maybe we could do something more like...

pre_check(Action, User, Context, ExternalThingy) ->
    Checks =
        [fun check_request/2,
         fun check_permission/2,
         fun check_dispatch_target/2,
         fun check_condition/2],
    Args = {Action, User, Context, ExternalThingy},
    Harness =
        fun
            (Check, ok)    -> Check(Args);
            (_,     Error) -> Error
        end,
    case lists:foldl(Harness, ok, Checks) of
        ok    -> dispatch(Action, User, Context);
        Error -> Error
    end.

Now the above example is pretty short, only having four checks to perform, but you can probably see quite quickly how we could have eliminated the Args variable entirely there and written a more general form of this function as a list operation (specifically, a non-backtracking, non-transforming pipeline). In this case we have a static thingy (your collection of initial arguments) and a list of thingies we want to run through (your checks). A normal fold would be a list of things to run a single function over, so in this case we have just swapped that around and have a static set of values we want to run a list of operations over. But values are just values, whether they are functions or not, so this is STILL a fold, if you choose to structure it that way.

Not everything lends itself to this kind of structure, of course, but quite often when you see a big chain of `case` statements pushing out past the right side of the screen you've either got a strong indication a pipeline is needed (or that you're looking at machine generated code).

There are several kinds of pipelines you can craft, depending on the situation, and the harness doesn't always have to be a closure (transforming pipelines don't have any need for that, for example).

Anyway....

While I've been typing this out, way past my bedtime and with copious mistakes, bad grammar and typos, I see that some others have already homed in on the first thing I mentioned: those initial atom-based-clause switches should be their own functions. Once you begin to refactor in this direction you will begin to see which sort of situation you are in as far as determining execution flow, and you will also subconsciously begin to EVOLVE A BETTER MODEL of your actual program, the problem it handles, and will probably stumble on a way to make most of the code either move somewhere else or collapse tremendously.

Simplicity tends to follow complexity. Whatever you write, get something working, then refactor it!

-Craig