Elang discussions? - a call for a new format

Joachim Durchholz joachim.durchholz@REDACTED
Thu Jan 15 16:40:14 CET 2004

WILLIAMS Dominic wrote:
>> 2) News software is better at filtering out uninteresting threads.
>> E.g.  in Mozilla, I can mark an entire thread as uninteresting in
>> news but not in mail.
> There is an interesting (new?) family of software called Bayesian 
> filters, I think the concept was invented, or at least promoted, by 
> www.PaulGraham.com, that is becoming quite popular.

Yes, I have followed the results on the application side of things. I
can't say that I have seen everything, but I have done a few Google
searches at the beginning of the year.

My findings:
* All implementations use Bayesian filtering to distinguish spam from
non-spam. Actually, you can use that for any binary distinction (e.g. I
use it to distinguish between "interesting" and "uninteresting" mail,
and the "uninteresting" category takes up not only spam but also things
like acknowledgements from moderated lists).
* No implementation whatsoever uses it for multi-way choices. Some claim
that Bayesian filtering is not so useful for multi-way choices anyway,
for reasons that I didn't find entirely convincing but I didn't follow
the arguments closely enough to say anything certain about the issue.

BTW Bayesian filtering is unsuitable for clipping uninteresting threads.
Sometimes a thread has all the interesting keywords, but I don't have
the time to follow that discussion right now. Or the discussion is a
rehash of a discussion that took place just a week ago. Or the
discussion was started by a troll, and I don't wish to participate.
Whatever - there are reasons why one might wish to skip a thread that
are unrelated to the messages' contents, and Bayesian filtering is
unsuitable for that purpose.

> I have not tried yet, but I understand they work by learning
> gradually from the way you sort your own email into different
> folders. This works for spam as well as for different interest
> categories or levels.

Many people believe this, but I have yet to see this implemented.
And it seems that the knowledgeable people don't think it will work
well, so this is getting delayed until somebody does the work and really
implements it (and, after that, we'll see whether the pessimists or the
optimists were right).

>> 3) I'm tired of getting the same response twice, once directly and 
>> once from the mailing list.
> Oh, I totally agree. Not only do we get two responses, but the 
> ordering often gets messed up.
> The (very simple) solution to this is twofold:
> 1) People should post *only* to the list, no CC-ing to anyone 
> directly. It is bad netiquette to post to a list without subscribing 
> to it, anyway.

For Mozilla, it's a manual cut&paste operation. Other clients make it
even more difficult.

> 2) The mailing list software should be configured so that the 
> "reply-to" field is the list's address.

The problem with header rewriting is that it clobbers the intentions of
the original poster. E.g. if the original poster had a Reply-To: field,
it will get overwritten.

>> I don't understand why mailing lists are so popular.
> Maybe because email is still the most accessible medium. I assure you
>  many, many people only have access to e-mail. I don't know how any 
> company in their right mind thinks that programmers can keep on top
> of their field without complete internet access, but it happens a
> lot.

Netnews access is just a matter of setting up a client and registering
with a news server (and there are free servers on the internet - I'm
using one).
You get cut off iff the firewall blocks the NNTP ports - maybe that's
indeed the case.

Currently looking for a new job.

More information about the erlang-questions mailing list