[erlang-questions] Non-reproducible bug on a live erlang system
Angel J. Alvarez Miguel
clist@REDACTED
Thu Jan 14 17:56:28 CET 2010
On Jueves, 14 de Enero de 2010 17:25:47 Kaiduan Xie escribió:
> Thanks Jayson and Attila for throwing light on this.
>
> To be more specific, this is a call processing system, it processes
> incoming message, and sends messages out. Customer reports call
> failure, and it does not generate crash report, it is a programming
> logic error. As I mentioned, this is a non-reproducible issue, or hard
> reproducible issue.
>
> 1. If this only happens to a particular user, then erlang built-in
> trace can help on this.
>
> 2. Otherwise, what to do?
>
> Has anyone encountered this before? How you solve it?
>
> Thanks,
>
> kaiduan
>
> On Thu, Jan 14, 2010 at 10:35 AM, Attila Rajmund Nohl
>
> <attila.r.nohl@REDACTED> wrote:
> > 2010/1/14, Kaiduan Xie <kaiduanx@REDACTED>:
> >> Hi, all,
> >>
> >> Consider the following case, you have a live/busy Erlang system in
> >> production which handles thousands of transactions per second and
> >> millions of users, and customer reported a non-reproducible bug. The
> >> problem is non-reproducible, or intermittent, or very hard to
> >> reproduce in live system and in lab.
> >
> > Does this bug involve a crash report with a stack trace? You can
> > always add some assert-like statements (i.e. if you know that a
> > variable must not bound to the 'undefined' atom at a certain point in
> > the code, you can add something like 'Variable /= undefined') where
> > you think something is wrong.
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
"A software bug is the common term used to describe an error, flaw, mistake,
failure, or fault in a computer program or system that produces an incorrect
or unexpected result, or causes it to behave in unintended ways."
A missbehaving system can still be (or pretend to be ) fully funtional in the
sense that no exceptions are triggered. Im with others that you need to make
more assertions on the code just to let the erlang runtime trigger the faulty
condition.
Holes in the software specifications allow (Type 1?) errors that are dificult
to trap. The mere fact that the system still handle millions of users without
severe degradatión makes clear this is the case.
I just remember some discussion on patterns like
case file:open(..) of
{ok,Fd} -> ...
true; ->
end
vs the (I think) more idiomatic (prolog inherited?)
{ok,Fd} = file:open()...
where the former is more perhaps more flexible (and prone to missbehaving) the
latter is rigid and safer (and needs a "try ... catch" container to deal with
errors on the same process or another process to wath for errors).
/Angel
More information about the erlang-questions
mailing list