[erlang-questions] Printed list to Erlang function

Hynek Vychodil vychodil.hynek@REDACTED
Wed Mar 30 22:04:52 CEST 2016


There is result for long list (667 words):

x clause
+ map
+--------------------------------------------------------------------------+
| xxx x         x                              +++          +     +       +|
| xxx x                                        +++                         |
| xxx                                          +++                         |
| xxx                                          +++                         |
| xx                                           +++                         |
| xx                                            ++                         |
| xx                                            ++                         |
| xx                                            ++                         |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
| xx                                            +                          |
|  x                                            +                          |
|  x                                            +                          |
|  x                                            +                          |
|  x                                            +                          |
|  x                                            +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
|                                               +                          |
||_A_|                                                                     |
|                                           |___MA____|                    |
+--------------------------------------------------------------------------+
Dataset: x N=50 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            5087.00
1st Qu.         5113.00
Median:         5137.00
3rd Qu.         5188.00
Max:            7081.00
Average:        5205.64 [     0.729718] (      5157.08 ‥       5372.30)
Std. Dev:       287.752 [     -33.4550] (      81.6038 ‥       633.923)

Outliers: 0/4 = 4 (μ=5206.37, σ=254.297)
        Outlier variance:      0.365232 (moderate)

------

Dataset: + N=50 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:         1.13720e+4
1st Qu.      1.14450e+4
Median:      1.14890e+4
3rd Qu.      1.15510e+4
Max:         1.51180e+4
Average:     1.16464e+4 [    -0.578036] (   1.15250e+4 ‥    1.19671e+4)
Std. Dev:       661.815 [     -48.2839] (      336.017 ‥       1217.81)

Outliers: 0/3 = 3 (μ=1.16458e+4, σ=613.531)
        Outlier variance:      0.384516 (moderate)

Difference at 95.0% confidence
        6440.78 ± 202.485
        123.727% ± 3.88972%
        (Student's t, pooled s = 510.294)
------

It is still faster when using function clause and performs nice 22 million
calls per second.

Pichi

On Wed, Mar 30, 2016 at 8:33 PM, Lloyd R. Prentice <lloyd@REDACTED>
wrote:

> Hi Pichi,
>
> Since I haven't learned yet how to design and conduct performance tests,
> results like these are both interesting and comforting.
>
> The long stop words list in http://www.ranks.nl/stopwords has something
> less than 700 words. So from these results it looks like either method
> would do the job in most applications, unless you are filtering stop words
> out of a huge archive of long documents.
>
> Many thanks, Pichi.
>
> Best wishes,
>
> LRP
>
> Sent from my iPad
>
> On Mar 30, 2016, at 2:12 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
> wrote:
>
> Every time I read a claim about how fast it will be I have urge test it. I
> had an idea that constant map in a module could be faster than function
> clause co I test it.
>
> I was wrong and RAO is right as usual. Function using function clause
> seems to be three times faster than using map.
>
> x clause
> + map
>
> +--------------------------------------------------------------------------+
> |xxxxx                                                     +++++
>  +|
> |xxxx                                                      ++++
>  |
> |xxxx                                                      +++
> |
> |xxxx                                                       ++
> |
> |xxx                                                        ++
> |
> |xxx                                                        ++
> |
> |xx                                                         ++
> |
> |xx                                                         ++
> |
> |xx                                                         ++
> |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> ||A|
> |
> |                                                         |_MA_|
> |
>
> +--------------------------------------------------------------------------+
> Dataset: x N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:            3490.00
> 1st Qu.         3551.00
> Median:         3591.00
> 3rd Qu.         3679.00
> Max:            3945.00
> Average:        3630.16 [     0.137534] (      3602.82 ‥       3664.56)
> Std. Dev:       113.400 [     -1.81311] (      90.8425 ‥       141.539)
>
> Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>         Outlier variance:      0.151802 (moderate)
>
> ------
>
> Dataset: + N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:         1.09500e+4
> 1st Qu.      1.10160e+4
> Median:      1.10400e+4
> 3rd Qu.      1.11270e+4
> Max:         1.28270e+4
> Average:     1.11055e+4 [     0.297998] (   1.10611e+4 ‥    1.12491e+4)
> Std. Dev:       264.914 [     -31.0673] (      84.7956 ‥       582.629)
>
> Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>         Outlier variance:    9.45082e-2 (slight)
>
> Difference at 95.0% confidence
>         7475.36 ± 80.8533
>         205.924% ± 2.22726%
>         (Student's t, pooled s = 203.763)
> ------
>
> It's about 31 million stopwords_clause:is_stopword/1 per second and 10
> million stopwords_map:is_stopword/1 per second.
>
> You can find code in gist
> https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
>
> Pichi
>
> On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice <lloyd@REDACTED>
> wrote:
>
>> Wow! What a cool idea.
>>
>> Thanks, Richard.
>>
>> Best wishes,
>>
>> LRP
>>
>> Sent from my iPad
>>
>> > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe" <ok@REDACTED>
>> wrote:
>> >
>> >
>> >> On 30/03/16 5:59 am, lloyd@REDACTED wrote:
>> >> So, I have a printed list of stop words:
>> >>
>> >> http://www.ranks.nl/stopwords
>> >>
>> >> I'd like to turn this list into an Erlang function that I can query---
>> >>
>> >> stopwords() ->
>> >>    ["word1", "word2" ... "wordN"].
>> >>
>> >> is_stopword(Word) ->
>> >>    List = stopwords(),
>> >>    lists_member(Word, List).
>> > Even if there is some arcane reason why you want the collection of words
>> > as a list, I strongly suggest generating
>> >
>> > is_stopword("a") -> true;
>> > is_stopword("about") -> true;
>> > ...
>> > is_stopword("yourselves") -> true;
>> > is_stopword(_) -> false.
>> >
>> > Open the list of stopwords in vi.
>> > :1,$s/^.*$/is_stopword("&") -> true;/
>> > :$a
>> > is_stopword(_) -> false.
>> > <ESC>
>> >
>> > The Erlang compiler will turn this into a trie, roughly speaking.
>> > This will be *dizzyingly* faster than the code you outlined.
>> >
>> >
>> >
>> >
>> >>
>> >> All my efforts so far have evolved into ugly kludges. Seems to me
>> there must be an elegant method that I'm overlooking.
>> >>
>> >> Some kind soul point the way?
>> >>
>> >> Many thanks,
>> >>
>> >> LRP
>> >>
>> >> *********************************************
>> >> My books:
>> >>
>> >> THE GOSPEL OF ASHES
>> >> http://thegospelofashes.com
>> >>
>> >> Strength is not enough. Do they have the courage
>> >> and the cunning? Can they survive long enough to
>> >> save the lives of millions?
>> >>
>> >> FREEIN' PANCHO
>> >> http://freeinpancho.com
>> >>
>> >> A community of misfits help a troubled boy find his way
>> >>
>> >> AYA TAKEO
>> >> http://ayatakeo.com
>> >>
>> >> Star-crossed love, war and power in an alternative
>> >> universe
>> >>
>> >> Available through Amazon or by request from your
>> >> favorite bookstore
>> >>
>> >>
>> >> **********************************************
>> >>
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160330/6e028b78/attachment.htm>


More information about the erlang-questions mailing list