[erlang-questions] Non-reproducible bug on a live erlang system

Kaiduan Xie kaiduanx@REDACTED
Thu Jan 14 17:25:47 CET 2010


Thanks Jayson and Attila for throwing light on this.

To be more specific, this is a call processing system, it processes
incoming message, and sends messages out. Customer reports call
failure, and it does not generate crash report, it is a programming
logic error. As I mentioned, this is a non-reproducible issue, or hard
reproducible issue.

1. If this only happens to a particular user, then erlang built-in
trace can help on this.

2. Otherwise, what to do?

Has anyone encountered this before? How you solve it?

Thanks,

kaiduan

On Thu, Jan 14, 2010 at 10:35 AM, Attila Rajmund Nohl
<attila.r.nohl@REDACTED> wrote:
> 2010/1/14, Kaiduan Xie <kaiduanx@REDACTED>:
>> Hi, all,
>>
>> Consider the following case, you have a live/busy Erlang system in
>> production which handles thousands of transactions per second and
>> millions of users, and customer reported a non-reproducible bug. The
>> problem is non-reproducible, or intermittent, or very hard to
>> reproduce in live system and in lab.
>
> Does this bug involve a crash report with a stack trace? You can
> always add some assert-like statements (i.e. if you know that a
> variable must not bound to the 'undefined' atom at a certain point in
> the code, you can add something like 'Variable /= undefined') where
> you think something is wrong.
>


More information about the erlang-questions mailing list