[erlang-questions] Printed list to Erlang function
Felix Gallo
felixgallo@REDACTED
Wed Mar 30 22:07:16 CEST 2016
If Dr. O'Keefe were here, he would say that you should get it right first,
and then worry about perfor...
...oh.
:)
F.
On Wed, Mar 30, 2016 at 1:04 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
wrote:
> There is result for long list (667 words):
>
> x clause
> + map
>
> +--------------------------------------------------------------------------+
> | xxx x x +++ + +
> +|
> | xxx x +++
> |
> | xxx +++
> |
> | xxx +++
> |
> | xx +++
> |
> | xx ++
> |
> | xx ++
> |
> | xx ++
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | xx +
> |
> | x +
> |
> | x +
> |
> | x +
> |
> | x +
> |
> | x +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> | +
> |
> ||_A_|
> |
> | |___MA____|
> |
>
> +--------------------------------------------------------------------------+
> Dataset: x N=50 CI=95.0000
> Statistic Value [ Bias] (Bootstrapped LB‥UB)
> Min: 5087.00
> 1st Qu. 5113.00
> Median: 5137.00
> 3rd Qu. 5188.00
> Max: 7081.00
> Average: 5205.64 [ 0.729718] ( 5157.08 ‥ 5372.30)
> Std. Dev: 287.752 [ -33.4550] ( 81.6038 ‥ 633.923)
>
> Outliers: 0/4 = 4 (μ=5206.37, σ=254.297)
> Outlier variance: 0.365232 (moderate)
>
> ------
>
> Dataset: + N=50 CI=95.0000
> Statistic Value [ Bias] (Bootstrapped LB‥UB)
> Min: 1.13720e+4
> 1st Qu. 1.14450e+4
> Median: 1.14890e+4
> 3rd Qu. 1.15510e+4
> Max: 1.51180e+4
> Average: 1.16464e+4 [ -0.578036] ( 1.15250e+4 ‥ 1.19671e+4)
> Std. Dev: 661.815 [ -48.2839] ( 336.017 ‥ 1217.81)
>
> Outliers: 0/3 = 3 (μ=1.16458e+4, σ=613.531)
> Outlier variance: 0.384516 (moderate)
>
> Difference at 95.0% confidence
> 6440.78 ± 202.485
> 123.727% ± 3.88972%
> (Student's t, pooled s = 510.294)
> ------
>
> It is still faster when using function clause and performs nice 22 million
> calls per second.
>
> Pichi
>
> On Wed, Mar 30, 2016 at 8:33 PM, Lloyd R. Prentice <lloyd@REDACTED>
> wrote:
>
>> Hi Pichi,
>>
>> Since I haven't learned yet how to design and conduct performance tests,
>> results like these are both interesting and comforting.
>>
>> The long stop words list in http://www.ranks.nl/stopwords has something
>> less than 700 words. So from these results it looks like either method
>> would do the job in most applications, unless you are filtering stop words
>> out of a huge archive of long documents.
>>
>> Many thanks, Pichi.
>>
>> Best wishes,
>>
>> LRP
>>
>> Sent from my iPad
>>
>> On Mar 30, 2016, at 2:12 PM, Hynek Vychodil <vychodil.hynek@REDACTED>
>> wrote:
>>
>> Every time I read a claim about how fast it will be I have urge test it.
>> I had an idea that constant map in a module could be faster than function
>> clause co I test it.
>>
>> I was wrong and RAO is right as usual. Function using function clause
>> seems to be three times faster than using map.
>>
>> x clause
>> + map
>>
>> +--------------------------------------------------------------------------+
>> |xxxxx +++++
>> +|
>> |xxxx ++++
>> |
>> |xxxx +++
>> |
>> |xxxx ++
>> |
>> |xxx ++
>> |
>> |xxx ++
>> |
>> |xx ++
>> |
>> |xx ++
>> |
>> |xx ++
>> |
>> |xx +
>> |
>> |xx +
>> |
>> |xx +
>> |
>> |xx +
>> |
>> |xx +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | x +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> | +
>> |
>> ||A|
>> |
>> | |_MA_|
>> |
>>
>> +--------------------------------------------------------------------------+
>> Dataset: x N=50 CI=95.0000
>> Statistic Value [ Bias] (Bootstrapped LB‥UB)
>> Min: 3490.00
>> 1st Qu. 3551.00
>> Median: 3591.00
>> 3rd Qu. 3679.00
>> Max: 3945.00
>> Average: 3630.16 [ 0.137534] ( 3602.82 ‥ 3664.56)
>> Std. Dev: 113.400 [ -1.81311] ( 90.8425 ‥ 141.539)
>>
>> Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>> Outlier variance: 0.151802 (moderate)
>>
>> ------
>>
>> Dataset: + N=50 CI=95.0000
>> Statistic Value [ Bias] (Bootstrapped LB‥UB)
>> Min: 1.09500e+4
>> 1st Qu. 1.10160e+4
>> Median: 1.10400e+4
>> 3rd Qu. 1.11270e+4
>> Max: 1.28270e+4
>> Average: 1.11055e+4 [ 0.297998] ( 1.10611e+4 ‥ 1.12491e+4)
>> Std. Dev: 264.914 [ -31.0673] ( 84.7956 ‥ 582.629)
>>
>> Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>> Outlier variance: 9.45082e-2 (slight)
>>
>> Difference at 95.0% confidence
>> 7475.36 ± 80.8533
>> 205.924% ± 2.22726%
>> (Student's t, pooled s = 203.763)
>> ------
>>
>> It's about 31 million stopwords_clause:is_stopword/1 per second and 10
>> million stopwords_map:is_stopword/1 per second.
>>
>> You can find code in gist
>> https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
>>
>> Pichi
>>
>> On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice <lloyd@REDACTED
>> > wrote:
>>
>>> Wow! What a cool idea.
>>>
>>> Thanks, Richard.
>>>
>>> Best wishes,
>>>
>>> LRP
>>>
>>> Sent from my iPad
>>>
>>> > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe" <ok@REDACTED>
>>> wrote:
>>> >
>>> >
>>> >> On 30/03/16 5:59 am, lloyd@REDACTED wrote:
>>> >> So, I have a printed list of stop words:
>>> >>
>>> >> http://www.ranks.nl/stopwords
>>> >>
>>> >> I'd like to turn this list into an Erlang function that I can query---
>>> >>
>>> >> stopwords() ->
>>> >> ["word1", "word2" ... "wordN"].
>>> >>
>>> >> is_stopword(Word) ->
>>> >> List = stopwords(),
>>> >> lists_member(Word, List).
>>> > Even if there is some arcane reason why you want the collection of
>>> words
>>> > as a list, I strongly suggest generating
>>> >
>>> > is_stopword("a") -> true;
>>> > is_stopword("about") -> true;
>>> > ...
>>> > is_stopword("yourselves") -> true;
>>> > is_stopword(_) -> false.
>>> >
>>> > Open the list of stopwords in vi.
>>> > :1,$s/^.*$/is_stopword("&") -> true;/
>>> > :$a
>>> > is_stopword(_) -> false.
>>> > <ESC>
>>> >
>>> > The Erlang compiler will turn this into a trie, roughly speaking.
>>> > This will be *dizzyingly* faster than the code you outlined.
>>> >
>>> >
>>> >
>>> >
>>> >>
>>> >> All my efforts so far have evolved into ugly kludges. Seems to me
>>> there must be an elegant method that I'm overlooking.
>>> >>
>>> >> Some kind soul point the way?
>>> >>
>>> >> Many thanks,
>>> >>
>>> >> LRP
>>> >>
>>> >> *********************************************
>>> >> My books:
>>> >>
>>> >> THE GOSPEL OF ASHES
>>> >> http://thegospelofashes.com
>>> >>
>>> >> Strength is not enough. Do they have the courage
>>> >> and the cunning? Can they survive long enough to
>>> >> save the lives of millions?
>>> >>
>>> >> FREEIN' PANCHO
>>> >> http://freeinpancho.com
>>> >>
>>> >> A community of misfits help a troubled boy find his way
>>> >>
>>> >> AYA TAKEO
>>> >> http://ayatakeo.com
>>> >>
>>> >> Star-crossed love, war and power in an alternative
>>> >> universe
>>> >>
>>> >> Available through Amazon or by request from your
>>> >> favorite bookstore
>>> >>
>>> >>
>>> >> **********************************************
>>> >>
>>> >> _______________________________________________
>>> >> erlang-questions mailing list
>>> >> erlang-questions@REDACTED
>>> >> http://erlang.org/mailman/listinfo/erlang-questions
>>> >
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions@REDACTED
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160330/f3ec7b61/attachment.htm>
More information about the erlang-questions
mailing list