[erlang-questions] Printed list to Erlang function

Lloyd R. Prentice lloyd@REDACTED
Wed Mar 30 20:33:00 CEST 2016


Hi Pichi,

Since I haven't learned yet how to design and conduct performance tests, results like these are both interesting and comforting.

The long stop words list in http://www.ranks.nl/stopwords has something less than 700 words. So from these results it looks like either method would do the job in most applications, unless you are filtering stop words out of a huge archive of long documents.

Many thanks, Pichi.

Best wishes,

LRP

Sent from my iPad

> On Mar 30, 2016, at 2:12 PM, Hynek Vychodil <vychodil.hynek@REDACTED> wrote:
> 
> Every time I read a claim about how fast it will be I have urge test it. I had an idea that constant map in a module could be faster than function clause co I test it.
> 
> I was wrong and RAO is right as usual. Function using function clause seems to be three times faster than using map.
> 
> x clause
> + map
> +--------------------------------------------------------------------------+
> |xxxxx                                                     +++++          +|
> |xxxx                                                      ++++            |
> |xxxx                                                      +++             |
> |xxxx                                                       ++             |
> |xxx                                                        ++             |
> |xxx                                                        ++             |
> |xx                                                         ++             |
> |xx                                                         ++             |
> |xx                                                         ++             |
> |xx                                                         +              |
> |xx                                                         +              |
> |xx                                                         +              |
> |xx                                                         +              |
> |xx                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> | x                                                         +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> |                                                           +              |
> ||A|                                                                       |
> |                                                         |_MA_|           |
> +--------------------------------------------------------------------------+
> Dataset: x N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:            3490.00
> 1st Qu.         3551.00
> Median:         3591.00
> 3rd Qu.         3679.00
> Max:            3945.00
> Average:        3630.16 [     0.137534] (      3602.82 ‥       3664.56)
> Std. Dev:       113.400 [     -1.81311] (      90.8425 ‥       141.539)
> 
> Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>         Outlier variance:      0.151802 (moderate)
> 
> ------
> 
> Dataset: + N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:         1.09500e+4
> 1st Qu.      1.10160e+4
> Median:      1.10400e+4
> 3rd Qu.      1.11270e+4
> Max:         1.28270e+4
> Average:     1.11055e+4 [     0.297998] (   1.10611e+4 ‥    1.12491e+4)
> Std. Dev:       264.914 [     -31.0673] (      84.7956 ‥       582.629)
> 
> Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>         Outlier variance:    9.45082e-2 (slight)
> 
> Difference at 95.0% confidence
>         7475.36 ± 80.8533
>         205.924% ± 2.22726%
>         (Student's t, pooled s = 203.763)
> ------
> 
> It's about 31 million stopwords_clause:is_stopword/1 per second and 10 million stopwords_map:is_stopword/1 per second.
> 
> You can find code in gist https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
> 
> Pichi
> 
>> On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice <lloyd@REDACTED> wrote:
>> Wow! What a cool idea.
>> 
>> Thanks, Richard.
>> 
>> Best wishes,
>> 
>> LRP
>> 
>> Sent from my iPad
>> 
>> > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe" <ok@REDACTED> wrote:
>> >
>> >
>> >> On 30/03/16 5:59 am, lloyd@REDACTED wrote:
>> >> So, I have a printed list of stop words:
>> >>
>> >> http://www.ranks.nl/stopwords
>> >>
>> >> I'd like to turn this list into an Erlang function that I can query---
>> >>
>> >> stopwords() ->
>> >>    ["word1", "word2" ... "wordN"].
>> >>
>> >> is_stopword(Word) ->
>> >>    List = stopwords(),
>> >>    lists_member(Word, List).
>> > Even if there is some arcane reason why you want the collection of words
>> > as a list, I strongly suggest generating
>> >
>> > is_stopword("a") -> true;
>> > is_stopword("about") -> true;
>> > ...
>> > is_stopword("yourselves") -> true;
>> > is_stopword(_) -> false.
>> >
>> > Open the list of stopwords in vi.
>> > :1,$s/^.*$/is_stopword("&") -> true;/
>> > :$a
>> > is_stopword(_) -> false.
>> > <ESC>
>> >
>> > The Erlang compiler will turn this into a trie, roughly speaking.
>> > This will be *dizzyingly* faster than the code you outlined.
>> >
>> >
>> >
>> >
>> >>
>> >> All my efforts so far have evolved into ugly kludges. Seems to me there must be an elegant method that I'm overlooking.
>> >>
>> >> Some kind soul point the way?
>> >>
>> >> Many thanks,
>> >>
>> >> LRP
>> >>
>> >> *********************************************
>> >> My books:
>> >>
>> >> THE GOSPEL OF ASHES
>> >> http://thegospelofashes.com
>> >>
>> >> Strength is not enough. Do they have the courage
>> >> and the cunning? Can they survive long enough to
>> >> save the lives of millions?
>> >>
>> >> FREEIN' PANCHO
>> >> http://freeinpancho.com
>> >>
>> >> A community of misfits help a troubled boy find his way
>> >>
>> >> AYA TAKEO
>> >> http://ayatakeo.com
>> >>
>> >> Star-crossed love, war and power in an alternative
>> >> universe
>> >>
>> >> Available through Amazon or by request from your
>> >> favorite bookstore
>> >>
>> >>
>> >> **********************************************
>> >>
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160330/823d5cdf/attachment.htm>


More information about the erlang-questions mailing list