[erlang-questions] Debug support on for guards?
Henning Diedrich
hd2010@REDACTED
Tue May 4 04:47:50 CEST 2010
Mega < 0, Sec < 0, Micro < 0 is wrong of course. It should be or.
Henning Diedrich wrote:
> Richard,
>
> thanks a lot for the detailed discussion, I follow almost all of it,
> here's where I am not sure.
>
> For context, the function is part of the encoding and decoding of data
> that is coming in from and going out to a VoltDB server. It will be
> used on a lower granular level to read out and create the contents of
> arrays, and SQL result table data. It is certain to at some point be
> crashed into invalid input due to errors of my programming of this
> API; or errors on part of the business logic programmer; or errors in
> the transmission if processing serialized data. There is potential for
> range errors, too, because {M,S,U} has a broader range than what
> VoltDB can store. Disallowing any negative values in M,S,U means for
> now disallowing timestamps before Christ. Theoretically, VoltDB could
> store timestamps ~4000 B.C. and at some point this should be made
> encompassed, and be as lenient as appropriate with negative values.
>
> The encoder function in question would receive time data from the
> domain logic into the API. It may be in any kind of shape if somebody
> did calculations with it. It may also be used to convert serialized data.
>
> 1
>
> I do not produce the function_clause Error Reason myself, but this is
> what Erlang produces if I have the >= 0 checks in the guard.
>
> My original question was if there where means in the language maybe to
> enhance this message that the guards throw.
>
> At this point I come to believe that this would be a useful thing and
> actually the best solution if that would be possible. That would save
> the advantages of literate programming that you point out, with the
> least, well, bloat.
>
> Because if I can't, but still want to have a custom messages being
> thrown, I would have to pull it out of the guards, into the code,
> which is the first step to bloat, and give away the elegance of the
> guard and literate code structure.
>
> Or, as I understood from recent posts, split as below, which is
> another way of bloat, adding boilerplate in form of a function heads
> and repetitive guards. (And not quite the way of splitting diagnosis
> and job as you proposed).
>
> But despite the function head repetition, my understanding is growing
> that this is the way to do it in Erlang (I won't assume this to be
> important but I also come to like it):
>
> [untested]
>
> *wire_time({Mega, Sec, Micro}) when is_integer(Mega),
> is_integer(Sec), is_integer(Micro),
> Mega < 0, Sec < 0, Micro < 0 ->
> erlang:error(negative_input_value);
>
> wire_time({Mega, Sec, Micro}) when is_integer(Mega),
> is_integer(Sec), is_integer(Micro) ->
>
> MilliEpoch = Mega * ?BILLION + Sec * 1000 + trunc(Micro /
> 1000), % (*)
> <<MilliEpoch:64>>.*
>
>
>
> --- Instead of: ---
>
> wire_time({Mega, Sec, Micro}) when is_integer(Mega),
> is_integer(Sec), is_integer(Micro) ->
>
> if (Mega >= 0), (Sec >= 0), (Micro >= 0) -> ok;
> true -> throw(negative_input_value) end,
> MilliEpoch = Mega * ?BILLION + Sec * 1000 +
> trunc(Micro /
> 1000), % (*)
> <<MilliEpoch:64>>.
>
> --- Or: ---
>
> wire_time({Mega, Sec, Micro}) when is_integer(Mega),
> is_integer(Sec), is_integer(Micro),
> Mega >= 0, Sec >= 0, Micro >= 0 ->
> MilliEpoch = Mega * ?BILLION + Sec * 1000 +
> trunc(Micro /
> 1000), % (*)
> <<MilliEpoch:64>>.
>
> However, once a better error detection is put into place, possibly
> accepting a mix of positive and negative values in the tuple, then it
> cannot be in the guards anymore in any event. It has to be calculated
> and then the result tested against the limits, instead of simply
> testing the tuple elements for >= 0.
>
> Would this then best be:
>
> *[untested]
>
> **-define(WIRE_MIN, ...).
> **-define(WIRE_MAX, ...).
>
> **wire_time({Mega, Sec, Micro}) when is_integer(Mega),
> is_integer(Sec), is_integer(Micro) ->
> wire_time(**milli_epoch({Mega, Sec, Micro})).*
> **
> *wire_time(MilliEpoch) when MilliEpoch < ?WIRE_MIN ->
> erlang:error(time_underrun);
>
> **wire_time(MilliEpoch) when MilliEpoch > ?WIRE_MAX ->
> erlang:error(time_overrun);
> *
> *wire_time(MilliEpoch) when is_integer(MilliEpoch) ->
> <<MilliEpoch:64>>;
> *
> * milli_epoch({Mega, Sec, Micro}) ->*
> * ** Mega * ?BILLION + Sec * 1000 + trunc(Micro / 1000).
> *
>
> --- or, briefer variabled: ---
>
> *-define(WIRE_MIN, ...).
> **-define(WIRE_MAX, ...).
>
> **wire_time({M,S,U}=N) when is_integer(M), is_integer(S),
> is_integer(U) ->
> wire_time(milli_epoch(N)).
>
> wire_time(T) when T < ?WIRE_MIN ->
> erlang:error(time_underrun);
>
> wire_time(T) when T > ?WIRE_MAX ->
> erlang:error(time_overrun);
>
> wire_time(T) when is_integer(T) ->
> <<T:64>>;
>
> milli_epoch({M,S,U}) ->
> M * ?BILLION + S * 1000 + trunc(U / 1000).*
> **
>
> This would be split up then, though again not into (pre-)diagnosis and
> job. It shows a lot of boiler-repetitions of function heads, does it?
>
> 2
>
> I maintain that in this case I am able to predict a wrong, naive
> interpretation of the error message. Or rather, that I can predict
> what will NOT come to mind as error source for the given Reason
> 'function_clause', until one inspects the source, or the docs; namely:
> that a part of the time tuple is negative.
>
> It's a less ambitious prediction than what you are proposing and
> defeating.
>
> But a programmer who is not me who is going to use this function, if
> she runs into the situation to have negative values fed into it, from
> getting 'function_clause' back, she will not know what to look for
> without looking at the implementation of the function and parsing the
> error message quite closely. Giving "negative_time_value" instead of
> "function_clause" will be a legitimate help to speed things up for her
> and not wrong.
>
> Trying to respond back to the server would mostly affect other
> functions, and not this one as I had erroneously proposed.
>
> I am willing to assume that even if I write a doc (and I will) it will
> not be read, or the remarks about the use of this function forgotten.
> I think such is life and it is forgivable. Giving a bit of a hint in
> errors can go a long way to boost GDP I think. Well, ok, reduce pain.
>
> Regards,
> Henning
>
>
>
> Richard O'Keefe wrote:
>>
>> On May 3, 2010, at 11:53 PM, Henning Diedrich wrote:
>>
>>> Thank you for the advice Richard!
>>>
>>> The function is to be called often and directly on non-tested data
>>> coming in over the network.
>>>
>>> So at first sight, splitting up seems to bloat things instead of
>>> making them clearer in this case. There is no moment in time, long
>>> up front, when the diagnosis should be done. I'll give splitting up
>>> a try to learn how that looks.
>>>
>>
>> Funny, I thought the style I proposed was all about *removing* bloat.
>>
>> One of the reasons I got interested in literate programming, back when
>> that was a new thing, was the fact that in typical imperative code,
>> the normal case could easily get lost in code dedicated to checking
>> arguments and reporting errors, and with literate programming I could
>> at least reduce that to
>> <<check arguments>>
>> if (ok) {
>> <<do the real work>>
>> } else {
>> <<report some sort of error>>
>> }
>> and then <<do the real work>> could be uncluttered.
>>
>>>> Why? What is the programmer supposed to do about it?
>>>> More to the point, what is the *program* supposed to do about it?
>>> 1) Discard the data he/it is trying to parse and tell the data
>>> source that it was no good, and why.
>>
>> But to what level of detail.
>> Recalling that the data is supposed to have the form
>> {M,S,U} where M,S,U are all non-negative integers,
>> "why" a term T might be no good could be
>>
>> - T is not a tuple
>> - T is a tuple but not of arity 3
>> - T is a tuple of arity >= 1
>> whose first element is not a number
>> - T is a tuple of arity >= 1
>> whose first element is not an integer
>> - T is a tuple of arity >= 1
>> oh heck there are just TOO many possibilities.
>>
>> And the whole thing is ambiguous.
>> Take {-1.2,fred,999999999999999999999}.
>> Is it wrong
>> - because the first argument ISN'T AN INTEGER
>> - because the first argument IS NEGATIVE
>> - because the second argument IS AN ATOM
>> - because the third argument IS OUT OF RANGE
>> (I don't think the code we've seen so far bothered to check that
>> the arguments were within reasonable bounds, but it should have.)
>>
>> I've been through this once, when I designed the exception handling
>> facility in Quintus Prolog and then had to go through all the source
>> code ensuring that every single operation available to users
>> reported reasonable errors. I'm going through it again in my
>> Smalltalk system, gradually replacing
>> "self error: 'I do not like thee Dr Fell'"
>> with
>> (NastyLecturerError subject: drFell critic: self) signal
>> and the answer is that as soon as there is more than one value to
>> inspect you *can't* report every possible error in a perfect way.
>> Take something like
>> anArray copyFrom: firstIndex to: lastIndex
>> which does what it looks like (inclusive bounds).
>> If lastIndex - firstIndex < -1, which one of them is at fault?
>> Any guess you make might be wrong.
>>
>> I therefore respectfully suggest that any attempt to provide
>> fine-grained diagnosis of "why" is misguided. To start with,
>> if someone sets up a data source that is supposed to be sending
>> you well formed time stamps, THEY MUST KNOW what a well formed
>> time stamp is supposed to look like. You *can* tell them
>> "<this> is not a well formed timestamp". And that's *all* you
>> can or should tell them. THEY have all the information they
>> need to make a diagnosis in a way that makes sense for their
>> situation. You do NOT.
>>
>> So here's what you should be doing.
>>
>> You have a protocol between you and your data source.
>> Typically, you will have {Action,Operand1,...,Operandn}.
>> Define types for the operands (in the UBF sense, which
>> includes ranges). If an operand does not conform to its
>> type, report that fact, AND NO MORE DETAIL THAN THAT.
>>
>> Better still, use UBF, and let UBF do the checking.
>>
>> You *can't* put the reporting in the function that
>> decodes a timestamp, for the simple reason that it doesn't
>> know who to report the error *to*, or how. The function
>> that decodes a timestamp has that job and no other.
>> It could possibly do
>>
>> decode({M,S,U}) when ... -> {ok,...};
>> decode(T) -> {error,invalid_timestamp,T}.
>>
>> But no more.
>>
>>>
>>> 2) If the data was self-generated, e.g. for testing, the programmer
>>> to use this function may make assumptions that are wrong and the
>>> programmer to program this function (me) wants a way to inform him
>>> in case he runs into trouble.
>>
>> WHY might the programmer make wrong assumptions?
>> Is there something desperately wrong with your documentation?
>>
>> What is it about your situation that lets someone know "this
>> operation needs a timestamp" but NOT know what a timestamp
>> is supposed to look like? If they don't know what a timestamp
>> is supposed to look like, how are they supposed to make sense
>> of your detailed diagnostic. Above all, HOW ON EARTH are they
>> supposed to make sense of a diagnostic that talks about which
>> test in some guard failed, when they should never even see the
>> guard in question?
>>
>> If you are testing, why can't you use UBF or a similar scheme of
>> your own devising?
>>
>>> By providing more information than function_clause. Especially if I
>>> can imagine what the wrong assumption may be.
>>
>> My absolutely uniform experience has been that when people imagine what
>> the wrong assumption may be, they are WRONG. It is worse than useless
>> to make such guesses. Don't ever waste anyone's time or bandwidth on
>> imagined causes of errors, just report the errors themselves simply and
>> clearly.
>>
>> You should not be reporting function_clause back to your data source in
>> the first place. You should be reporting {type_error,timestamp,T} or
>> something like that which is framed in terms of your PROTOCOL between
>> your program and the data source, which neither reveals nor depends on
>> ANY internal aspect of your program whatever.
>>
>> It may be that there is more about your program that you could tell us.
>>
>>
>
More information about the erlang-questions
mailing list