[erlang-questions] Printed list to Erlang function

Felix Gallo felixgallo@REDACTED
Wed Mar 30 22:07:16 CEST 2016


If Dr. O'Keefe were here, he would say that you should get it right first,
and then worry about perfor...

...oh.

:)

F.

On Wed, Mar 30, 2016 at 1:04 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
wrote:

> There is result for long list (667 words):
>
> x clause
> + map
>
> +--------------------------------------------------------------------------+
> | xxx x         x                              +++          +     +
> +|
> | xxx x                                        +++
> |
> | xxx                                          +++
> |
> | xxx                                          +++
> |
> | xx                                           +++
> |
> | xx                                            ++
> |
> | xx                                            ++
> |
> | xx                                            ++
> |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> | xx                                            +
>  |
> |  x                                            +
>  |
> |  x                                            +
>  |
> |  x                                            +
>  |
> |  x                                            +
>  |
> |  x                                            +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> |                                               +
>  |
> ||_A_|
> |
> |                                           |___MA____|
>  |
>
> +--------------------------------------------------------------------------+
> Dataset: x N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:            5087.00
> 1st Qu.         5113.00
> Median:         5137.00
> 3rd Qu.         5188.00
> Max:            7081.00
> Average:        5205.64 [     0.729718] (      5157.08 ‥       5372.30)
> Std. Dev:       287.752 [     -33.4550] (      81.6038 ‥       633.923)
>
> Outliers: 0/4 = 4 (μ=5206.37, σ=254.297)
>         Outlier variance:      0.365232 (moderate)
>
> ------
>
> Dataset: + N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:         1.13720e+4
> 1st Qu.      1.14450e+4
> Median:      1.14890e+4
> 3rd Qu.      1.15510e+4
> Max:         1.51180e+4
> Average:     1.16464e+4 [    -0.578036] (   1.15250e+4 ‥    1.19671e+4)
> Std. Dev:       661.815 [     -48.2839] (      336.017 ‥       1217.81)
>
> Outliers: 0/3 = 3 (μ=1.16458e+4, σ=613.531)
>         Outlier variance:      0.384516 (moderate)
>
> Difference at 95.0% confidence
>         6440.78 ± 202.485
>         123.727% ± 3.88972%
>         (Student's t, pooled s = 510.294)
> ------
>
> It is still faster when using function clause and performs nice 22 million
> calls per second.
>
> Pichi
>
> On Wed, Mar 30, 2016 at 8:33 PM, Lloyd R. Prentice <lloyd@REDACTED>
> wrote:
>
>> Hi Pichi,
>>
>> Since I haven't learned yet how to design and conduct performance tests,
>> results like these are both interesting and comforting.
>>
>> The long stop words list in http://www.ranks.nl/stopwords has something
>> less than 700 words. So from these results it looks like either method
>> would do the job in most applications, unless you are filtering stop words
>> out of a huge archive of long documents.
>>
>> Many thanks, Pichi.
>>
>> Best wishes,
>>
>> LRP
>>
>> Sent from my iPad
>>
>> On Mar 30, 2016, at 2:12 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
>> wrote:
>>
>> Every time I read a claim about how fast it will be I have urge test it.
>> I had an idea that constant map in a module could be faster than function
>> clause co I test it.
>>
>> I was wrong and RAO is right as usual. Function using function clause
>> seems to be three times faster than using map.
>>
>> x clause
>> + map
>>
>> +--------------------------------------------------------------------------+
>> |xxxxx                                                     +++++
>>  +|
>> |xxxx                                                      ++++
>>  |
>> |xxxx                                                      +++
>>   |
>> |xxxx                                                       ++
>>   |
>> |xxx                                                        ++
>>   |
>> |xxx                                                        ++
>>   |
>> |xx                                                         ++
>>   |
>> |xx                                                         ++
>>   |
>> |xx                                                         ++
>>   |
>> |xx                                                         +
>>  |
>> |xx                                                         +
>>  |
>> |xx                                                         +
>>  |
>> |xx                                                         +
>>  |
>> |xx                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> | x                                                         +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> |                                                           +
>>  |
>> ||A|
>>   |
>> |                                                         |_MA_|
>>   |
>>
>> +--------------------------------------------------------------------------+
>> Dataset: x N=50 CI=95.0000
>> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
>> Min:            3490.00
>> 1st Qu.         3551.00
>> Median:         3591.00
>> 3rd Qu.         3679.00
>> Max:            3945.00
>> Average:        3630.16 [     0.137534] (      3602.82 ‥       3664.56)
>> Std. Dev:       113.400 [     -1.81311] (      90.8425 ‥       141.539)
>>
>> Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>>         Outlier variance:      0.151802 (moderate)
>>
>> ------
>>
>> Dataset: + N=50 CI=95.0000
>> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
>> Min:         1.09500e+4
>> 1st Qu.      1.10160e+4
>> Median:      1.10400e+4
>> 3rd Qu.      1.11270e+4
>> Max:         1.28270e+4
>> Average:     1.11055e+4 [     0.297998] (   1.10611e+4 ‥    1.12491e+4)
>> Std. Dev:       264.914 [     -31.0673] (      84.7956 ‥       582.629)
>>
>> Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>>         Outlier variance:    9.45082e-2 (slight)
>>
>> Difference at 95.0% confidence
>>         7475.36 ± 80.8533
>>         205.924% ± 2.22726%
>>         (Student's t, pooled s = 203.763)
>> ------
>>
>> It's about 31 million stopwords_clause:is_stopword/1 per second and 10
>> million stopwords_map:is_stopword/1 per second.
>>
>> You can find code in gist
>> https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
>>
>> Pichi
>>
>> On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice <lloyd@REDACTED
>> > wrote:
>>
>>> Wow! What a cool idea.
>>>
>>> Thanks, Richard.
>>>
>>> Best wishes,
>>>
>>> LRP
>>>
>>> Sent from my iPad
>>>
>>> > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe" <ok@REDACTED>
>>> wrote:
>>> >
>>> >
>>> >> On 30/03/16 5:59 am, lloyd@REDACTED wrote:
>>> >> So, I have a printed list of stop words:
>>> >>
>>> >> http://www.ranks.nl/stopwords
>>> >>
>>> >> I'd like to turn this list into an Erlang function that I can query---
>>> >>
>>> >> stopwords() ->
>>> >>    ["word1", "word2" ... "wordN"].
>>> >>
>>> >> is_stopword(Word) ->
>>> >>    List = stopwords(),
>>> >>    lists_member(Word, List).
>>> > Even if there is some arcane reason why you want the collection of
>>> words
>>> > as a list, I strongly suggest generating
>>> >
>>> > is_stopword("a") -> true;
>>> > is_stopword("about") -> true;
>>> > ...
>>> > is_stopword("yourselves") -> true;
>>> > is_stopword(_) -> false.
>>> >
>>> > Open the list of stopwords in vi.
>>> > :1,$s/^.*$/is_stopword("&") -> true;/
>>> > :$a
>>> > is_stopword(_) -> false.
>>> > <ESC>
>>> >
>>> > The Erlang compiler will turn this into a trie, roughly speaking.
>>> > This will be *dizzyingly* faster than the code you outlined.
>>> >
>>> >
>>> >
>>> >
>>> >>
>>> >> All my efforts so far have evolved into ugly kludges. Seems to me
>>> there must be an elegant method that I'm overlooking.
>>> >>
>>> >> Some kind soul point the way?
>>> >>
>>> >> Many thanks,
>>> >>
>>> >> LRP
>>> >>
>>> >> *********************************************
>>> >> My books:
>>> >>
>>> >> THE GOSPEL OF ASHES
>>> >> http://thegospelofashes.com
>>> >>
>>> >> Strength is not enough. Do they have the courage
>>> >> and the cunning? Can they survive long enough to
>>> >> save the lives of millions?
>>> >>
>>> >> FREEIN' PANCHO
>>> >> http://freeinpancho.com
>>> >>
>>> >> A community of misfits help a troubled boy find his way
>>> >>
>>> >> AYA TAKEO
>>> >> http://ayatakeo.com
>>> >>
>>> >> Star-crossed love, war and power in an alternative
>>> >> universe
>>> >>
>>> >> Available through Amazon or by request from your
>>> >> favorite bookstore
>>> >>
>>> >>
>>> >> **********************************************
>>> >>
>>> >> _______________________________________________
>>> >> erlang-questions mailing list
>>> >> erlang-questions@REDACTED
>>> >> http://erlang.org/mailman/listinfo/erlang-questions
>>> >
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160330/f3ec7b61/attachment.htm>


More information about the erlang-questions mailing list