[erlang-questions] Looking for the Secure Coding Guide

Sun Mar 3 14:22:41 CET 2019

>
>  (yes, I did
> find some secure coding recommendations - like "do not program
> defensively" - http://www.erlang.se/doc/programming_rules.shtml#HDR11,
> but didn't find the advice compelling). So, does a secure coding guide
> exist exist and if so, could I get a copy of it? If one does not exist,
> is there something in development and when will it be available?
>
>
While I do recognize we should have a commented version of the OWASP guide
in the view of Erlang, and other functional programming languages such as
Haskell or Ocaml, the above requires some elaboration as to why it is a
guideline.

There are two kinds of input validation in a system for a given function.
One is when you have an untrusted source (colloquially known as "the
enemy"), the other is when the source is trusted. For the case where the
enemy provides data, it should usually be validated with a function looking
something like:

-spec validate(Input :: term()) -> {ok, Canonicalized :: term()} | {error,
validation}.

This also generally holds true for library design. An application boundary
should generally behave somewhat nicely and be able to return errors as
values so one can match on them and act accordingly. The rule is that you
should punt decisions to the caller of the library, though it does have its
exceptions.

However, this is not what the coding guidelines suggest. They pertain to
the case of trusted input. In that case, writing protective routines (i.e.,
defensive code), is usually a mistake in an Erlang system. There is a
supervision tree ready to handle this problem if it occurs, and you cannot
always fully reflect on each and every possible failure scenario. In fact,
trying to mitigate a failure you have not seen in the wild might introduce
bugs into the program when it does happen. So defensive code is not a
priori a safer option. For trusted sources, it is better to program for the
happy path only, and let the errors be logged so they can be handled
properly by rewiring of code.

In general, catching all mistakes, trying to mitigate them, and then
propagate the error upwards in the call stack is not always a desirable
thing. You might not have thought about the particular failure scenario, so
it is impossible to mitigate the error before having seen it in the wild.
Or you might have assumed that the particular scenario never occurs in
practice, in which case the error report is a subtle hint and teaching
moment. I often hedge by leaving out some error handling early on, and then
provoke the error cases to install the correct handling for the ones that
do occur in the real world.

There are some reasons this approach is good:

* Writing for the happy path first allows a much faster time-to-market for
a given feature. Some languages force you to envision a lot of defensive
additional code before the code is marketable. And as laid out above, that
code is rarely executed. Hence, if it contains a bug, you might not know
early on.

* If an error occur in the production system, you don't have to handle the
error immediately if it rare in occurrence. By construction, a crashing
process cannot tie up resources in the system (caveats: ETS, disk storage
resources, etc). So you can often leave the error in there until the next
service window or deploy. This is particularly strong in the modern world
where people continuously deploy code, because you can deploy code and then
hotfix the code for the missing corner cases subsequently. This leads to a
much faster development cycle overall.

* In a concurrent system, there are certain failure scenarios you might
want to detect rather than handle. Detection is often far easier than
properly mitigating. So if the occurrence is rare, you can just restart the
program and try again.

Finally, there are some overarching security considerations:

* The atom table is limited to about a million atoms. Do not let the enemy
create arbitrary atoms (see: e.g., erlang:binary_to_existing_atom(..) for a
mitigation technique).
* ETS is not garbage collected, so make sure to clean up. Having a janitor
process doing the cleanup and using the concept of monitors to manage the
lifetime of other processes storing data in ETS can help a lot.
* The security boundary is the distributed erlang cluster, not an
individual node. Once a node is breached, the whole cluster is lost.
* You can mark processes as containing sensitive information, which makes
them unable to be inspected by accident by an operator. This is useful if a
process contains the current table of law enforcement wiretaps. There are
ways around it, but the idea is that you don't accidentally stumble into
information that is sensitive.
* The gen_server callback Module:format_status(..) can be used to devour
password information in error logs.
* Erlang is a generally memory safe language, so many errors pertaining to
bounds checks are not possible. Something like Heartbleed cannot occur,
though something like the Bleichenbacher attacks can.
* However do put sensible resource limits on enemies. Don't let them create
a binary of 2 gigabyte of size in memory. Load regulate heavy-weight
computation in the node and only allow a limited number of those processes.
* Integer overflow/underflow doesn't occur in the language since integers
have arbitrary size. But when writing them out into a fixed size integer,
care must be taken as it will wrap around there.
* And finally, do not use the dynamic features of the system to let the
enemy execute arbitrary code on your end.

-- 
J.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20190303/e205588c/attachment.htm>