Never trust a statistics you didn't forge yourself

Joe Armstrong (AL/EAB) joe.armstrong@REDACTED
Wed Feb 22 13:09:28 CET 2006


 

> -----Original Message-----
> From: owner-erlang-questions@REDACTED 
> [mailto:owner-erlang-questions@REDACTED] On Behalf Of Michael Suess
> Sent: den 21 februari 2006 22:45
> To: Marc van Woerkom
> Cc: leopold@REDACTED; erlang-questions@REDACTED
> Subject: Re: Never trust a statistics you didn't forge yourself
> 
> On Tuesday 21 February 2006 13:52, Marc van Woerkom wrote:
> > Hello,
> 
> Dear Marc, dear members of the Erlang Mailing List,
> 
> I am the main author of the study that has been discussed in 
> this thread and also in this one:
> http://www.erlang.org/ml-archive/erlang-questions/200602/msg00355.html
> 
> I have thought long about whether or not I should try to 
> defend our study here, especially since we seem to have 
> raised a lot of hostility here for some reason. I will 
> therefore start with answering the points you made in this 
> mail, and secondly add some opinions to the points raised in 
> the other thread. 
> 
> > I just got a copy of your survey results and I am upset about it:

Me too. 

> You are of course free to feel that way. I knew that I would 
> upset some people with this study, especially the ones whose 
> "pet system" has not gotten many votes. What surprises me 
> though is, that I the only flames I got until now come from 
> the Erlang-community, although Erlang has gotten way more 
> than its share of publicity in our paper. 
> 
> > First you write:
> > > To raise more interest for the survey and generate 
> submissions, we 
> > > have sent messages to several discussion groups, once 
> shortly after 
> > > the start of the survey and once shortly before the end:
> > > ? Usenet: comp.parallel, comp.parallel.mpi, comp.parallel.pvm, 
> > > comp.sys.super, aus.computers.parallel, 
> comp.programming.threads ? 
> > > Mailing Lists: beowulf@REDACTED, lam@REDACTED, 
> omp@REDACTED, 
> > > compunity@REDACTED, bspall@REDACTED 
> ? Forums: 
> > > CodeGuru Forum, Intel Developer Services Forum ? Websites: 
> > > slashdot.org
> >
> > then
> >
> > > We know of one incident, where
> > > an influential member of the Erlang community requested 
> members of 
> > > their mailing list to show their support for Erlang by 
> participating 
> > > in the survey, which they promptly did.
> > > Incidents like this one have of course influenced the results as 
> > > well.
> >
> > This implies that some kind of guru is able to motivate his minions.
> 
> This is your interpretation, and I certainly did not mean it 
> that way. I have paid careful attention to make it very 
> clear, how the study was carried out and whom I have 
> contacted. I have tried to contact a mailing list or forum 
> for every parallel programming system that was explicitly 
> mentioned in the survey question two. Erlang is not 
> explicitly mentioned there, and I have therefore not posted 
> to this list. Why was it not explicitly mentioned there? 
> Because I had to make a subjective choice about what systems 
> interested me most, and about what systems I thought were 
> most widely used. And like it or not, Erlang was not on this list.
> 
> Fact is, that Joe posted about the survey on this mailing 
> list, you can still find his post in the archives:
> http://www.erlang.org/ml-archive/erlang-questions/200510/msg00171.html

And what did I say here? - 

The complete post was:

              If you use Erlang, why not tell

                   http://www.plm.eecs.uni-kassel.de/parasurvey/

              About it

              /Joe

Just compare this for a moment to what was posted to
the LAM/MPI General User's Mailing List

http://www.lam-mpi.org/MailArchives/lam/2005/10/11389.php

I quote:

	> More than 50 people have filled out the survey this far, and
therefore I will
      > be evaluating the results shortly (it will close in two weeks,
on November
      > the 5th!). But before I do, please consider filling out the
survey to make
      > the results even more valuable. Of course, I will make the
results available
      > to everyone who participated. And before I forget to mention it:
two gift
      > certificates from amazon.com are being awarded to everyone who
participated.

      > Thank you for your cooperation,
      > Best Regards,
      > Michael Suess 

The bit about the gift certificates goes totally unmentioned in your
paper.

You say  "An influential member of the Erlang community 
requested members of their mailing list to show there support of Erlang
by participating 
in the Survey" - which is a false claim - I never said anything about
support - I asked the
people on this list to tell you about their experiences.

And then you omit to say that gift certificates were offered to members
of other mailing
lists, and finally when you get an unexpectly positive response from the
members of the Erlang 
mailing list you dismiss this this result since I asked members of this
list to
participate in your survey.

Usually in an academic paper it is consider de rigour to describe your
experimental procedure.
Omitting to mention that you offered gift certificates to people who
filled in the
survey is rather strange since it is probably that it will bias the
results.

Note that doing so you may well have biased the results in favour of MPI
- MPI got the highest
rating among parallel programming systems - was this because you offered
them gift certificates?

If you are going to make unsubstantiated claims in your paper about the
supposed
influence of any mailing that I made to this list - then you should also
mention the
ways in which you other results might be biased.

And by the way - you haven't sent me a gift certificate - was the offer
only open
to people on the MPI list? - did the other people on this list get any
gift certificates?

	
 > Do any of you disagree, that Erlang has gotten a higher 
> number of votes because of this? I do not think so. And let 
> me get this straight: I am not mad at Joe for doing it or 
> anything, these votes are as valid as anyone elses and I am 
> thankful to anyone who provided me with his/her opinion. But 
> I have an explanation for the relatively high number of votes 
  *********************
> for Erlang and I have been open about all other aspects of 
> this survey. Why the heck should I keep this one secret??

It's not an explanation - it's a hypothesis. This is not science
it is your personal opinion with no evidence to support it.

I can think of one or two other hypotheses.

To start with Erlang is unique in your survey in the sense that
concurrency is part of the language and not the OS.

Let me give you some examples:

	c++ is a language but NOT a concurrent programming system
	PVM is a concurrent programming system but NOT a language

This is true of ALL the languages/systems in your paper EXCEPT Erlang.

      Erlang is a language AND a concurrent programming system.

      And thus is belongs to both figures 2 and 5.

      Figure 2 is also incorrectly labelled - the caption is "Parallel
programming languages"
here you mention C, C++, Fortran, java , and something called functional
and logical

      This is very misleading functional and logical are NOT languages
they are classes
of languages.

     C, Fortran are NOT parallel languages.

     Now there are parallel languages (Erlang, Oz, parlog, occam, ....)
but non of the
ones you mention are parallel languages. In C, C++, Java etc the
concurrency is provided by
the OS and NOT the language.

Now back to hypothesis building - why did Erlang users response 
to your survey but users of PVM, C++ etc.  did not.

Here are some hypotheses

	1) There are more Erlang users than we realise - it is spreading
but we don't
	   know about it
	2) They are more active - evangelical - whatever
	3) And influential person influenced the vote

Now you say in your paper

	"An influential member of the Erlang community requested members
of their mailing list
to show there support of Erlang by participating in the Survey"

     I did not "request members .. to show their support"

     I said "If you use Erlang, why not tell ... about it"

Later you say just after figure 4

	"We have already told you the reasons for the relatively high
numbers for Erlang is
section 2, so please treat them with caution"

I beg to differ, you have not "told you the reasons" - you have advanced
ONE hypothesis.

Now back to my three hypotheses (above)

	1) - I suspect not
	3) - I reject 

	2) - YES - we are more evangelical - programming languages are
best
	   likened to religions  - we believe in them - they represent
belief
	   systems. 

	So why are Erlang users more evangelical than other
language/system users?

	Now this is pure guesswork - I think it's because Erlang is both
a language and
a parallel programming system - Erlang lowers the "barrier for entry"
into parallel
programming - it's really easy - and so it becomes possible to easily
experiment
with concurrent and parallel systems - this is (I think) the real reason
why the
Erlang figure were unexpectedly high.

   

> I have made it very clear in my paper, that the results of 
> this survey are not statistically relevant. They cannot be, 
> because I know of no way to sample a proportional sample of 
> the parallel programming community. If you have any idea on 
> how to do that and have the money for it, feel free to 
> contact me. I have not been able to find any comparable data 
> anywhere at all. I did the best I could to provide at least 
> some clues. These data are not statistics, they are 
> indications at best. I have made it very clear in my paper, 
> that the data are not statistically relevant. I have been 
> very careful with my observations and I have been as open as 
> I could regarding the process of gathering the data. More I cannot do.
> 
> > I can't remember who it was who posted this to erlang 
> mailing list. I 
> > was just informed that there was a survey and of course it made 
> > perfect sense to participate as I am a user of a concurrent 
> language. 
> > That's what it is good for.
> 
> Sure. Thanks again for voting and writing your opinion.
> 
> > There are obviously at least two camps of concurrent problems - the 
> > massive number crunching ones and the ones where you have lots of 
> > parallel transactions in a possibly distributed system - 
> that's where 
> > Erlang is strong.

But you missed the biggest use of parallism - in distributed
systems where the problem itself is intrinsically distributed.

The world's biggest and first parallel machine was the
telephone exchange.

The internet itself is the biggest example of a concurrent and
parallel system.

This is where Erlang shines - in writing applications where the
concurrency and distribution are an intrinsic part of the problem.

This might also account for the high response rate from Erlang fans -
Erlang is predominantly used for networking applications - and (I am
speculating here)
traditional parallel systems are used for number-crunching cluster like
applications.

Most hobby hackers like to build web-things rather than numerical
cluster
application so I guess yet another reason why you get an unexspectly
positive
response from the Erlang list is that there is a higher ratio of fun
projects  
being performed my members of the list than in other groups. Also, I
suspect we
attract more hobby hackers than the traditional cluster programming
techniques.


/Joe



More information about the erlang-questions mailing list