[erlang-questions] Non-reproducible bug on a live erlang system

Kaiduan Xie kaiduanx@REDACTED
Thu Jan 14 19:38:55 CET 2010


"Im with others that you need to make
more assertions on the code just to let the erlang runtime trigger the faulty
condition."

Very good point, Angel, just let it crash!

kaiduan

2010/1/14 Angel J. Alvarez Miguel <clist@REDACTED>:
> On Jueves, 14 de Enero de 2010 17:25:47 Kaiduan Xie escribió:
>> Thanks Jayson and Attila for throwing light on this.
>>
>> To be more specific, this is a call processing system, it processes
>> incoming message, and sends messages out. Customer reports call
>> failure, and it does not generate crash report, it is a programming
>> logic error. As I mentioned, this is a non-reproducible issue, or hard
>> reproducible issue.
>>
>> 1. If this only happens to a particular user, then erlang built-in
>> trace can help on this.
>>
>> 2. Otherwise, what to do?
>>
>> Has anyone encountered this before? How you solve it?
>>
>> Thanks,
>>
>> kaiduan
>>
>> On Thu, Jan 14, 2010 at 10:35 AM, Attila Rajmund Nohl
>>
>> <attila.r.nohl@REDACTED> wrote:
>> > 2010/1/14, Kaiduan Xie <kaiduanx@REDACTED>:
>> >> Hi, all,
>> >>
>> >> Consider the following case, you have a live/busy Erlang system in
>> >> production which handles thousands of transactions per second and
>> >> millions of users, and customer reported a non-reproducible bug. The
>> >> problem is non-reproducible, or intermittent, or very hard to
>> >> reproduce in live system and in lab.
>> >
>> > Does this bug involve a crash report with a stack trace? You can
>> > always add some assert-like statements (i.e. if you know that a
>> > variable must not bound to the 'undefined' atom at a certain point in
>> > the code, you can add something like 'Variable /= undefined') where
>> > you think something is wrong.
>>
>> ________________________________________________________________
>> erlang-questions mailing list. See http://www.erlang.org/faq.html
>> erlang-questions (at) erlang.org
>
> "A software bug is the common term used to describe an error, flaw, mistake,
> failure, or fault in a computer program or system that produces an incorrect
> or unexpected result, or causes it to behave in unintended ways."
>
>
> A missbehaving system can still be (or pretend to be ) fully funtional in the
> sense that no exceptions are triggered. Im with others that you need to make
> more assertions on the code just to let the erlang runtime trigger the faulty
> condition.
>
> Holes in the software specifications allow (Type 1?) errors that are dificult
> to trap. The mere fact that the system still handle millions of users without
> severe degradatión makes clear this is the case.
>
> I just remember some discussion on patterns like
>
> case file:open(..) of
>        {ok,Fd} -> ...
>        true; ->
> end
>
> vs the (I think) more idiomatic (prolog inherited?)
> {ok,Fd} = file:open()...
>
> where the former is more perhaps more flexible (and prone to missbehaving) the
> latter is rigid and safer (and needs a "try ... catch" container to deal with
> errors on the same process or another process to wath for errors).
>
> /Angel
>
>
>
>
>
>
>
>
>
>
>
> ________________________________________________________________
> erlang-questions mailing list. See http://www.erlang.org/faq.html
> erlang-questions (at) erlang.org
>
>


More information about the erlang-questions mailing list