[erlang-questions] Printed list to Erlang function

Hynek Vychodil vychodil.hynek@REDACTED
Wed Mar 30 22:11:41 CEST 2016


I had same intuition so why I tried maps first. If you know how it is
implemented (You can look at compiled bytecode) and know how BEAM works it
shows how well is BEAM written or there is something fishy in maps
implementation. (It's 18.3.)

Pichi

On Wed, Mar 30, 2016 at 8:23 PM, Jason Douglas <jasond@REDACTED> wrote:

> At very large sizes though, wouldn’t the map (with O(1) access) become
> faster than the trie built by the compiler (which will be proportional to
> lookup term length, when sufficiently dense)? I used to deal a lot with
> pre-compiled tries and hash tables in C++, and while I loved tries for
> space efficiency, a hash table always had the upper hand in performance.
>
> I can see though why for a small set like this, a carefully tuned
> implementation like Erlang’s would beat out the map.
>
> Jason
>
> On Mar 30, 2016, at 11:12 AM, Hynek Vychodil <vychodil.hynek@REDACTED>
> wrote:
>
> Every time I read a claim about how fast it will be I have urge test it. I
> had an idea that constant map in a module could be faster than function
> clause co I test it.
>
> I was wrong and RAO is right as usual. Function using function clause
> seems to be three times faster than using map.
>
> x clause
> + map
>
> +--------------------------------------------------------------------------+
> |xxxxx                                                     +++++
>  +|
> |xxxx                                                      ++++
>  |
> |xxxx                                                      +++
> |
> |xxxx                                                       ++
> |
> |xxx                                                        ++
> |
> |xxx                                                        ++
> |
> |xx                                                         ++
> |
> |xx                                                         ++
> |
> |xx                                                         ++
> |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> |xx                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> | x                                                         +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> |                                                           +
>  |
> ||A|
> |
> |                                                         |_MA_|
> |
>
> +--------------------------------------------------------------------------+
> Dataset: x N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:            3490.00
> 1st Qu.         3551.00
> Median:         3591.00
> 3rd Qu.         3679.00
> Max:            3945.00
> Average:        3630.16 [     0.137534] (      3602.82 ‥       3664.56)
> Std. Dev:       113.400 [     -1.81311] (      90.8425 ‥       141.539)
>
> Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>         Outlier variance:      0.151802 (moderate)
>
> ------
>
> Dataset: + N=50 CI=95.0000
> Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
> Min:         1.09500e+4
> 1st Qu.      1.10160e+4
> Median:      1.10400e+4
> 3rd Qu.      1.11270e+4
> Max:         1.28270e+4
> Average:     1.11055e+4 [     0.297998] (   1.10611e+4 ‥    1.12491e+4)
> Std. Dev:       264.914 [     -31.0673] (      84.7956 ‥       582.629)
>
> Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>         Outlier variance:    9.45082e-2 (slight)
>
> Difference at 95.0% confidence
>         7475.36 ± 80.8533
>         205.924% ± 2.22726%
>         (Student's t, pooled s = 203.763)
> ------
>
> It's about 31 million stopwords_clause:is_stopword/1 per second and 10
> million stopwords_map:is_stopword/1 per second.
>
> You can find code in gist
> https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
>
> Pichi
>
> On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice <lloyd@REDACTED>
> wrote:
>
>> Wow! What a cool idea.
>>
>> Thanks, Richard.
>>
>> Best wishes,
>>
>> LRP
>>
>> Sent from my iPad
>>
>> > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe" <ok@REDACTED>
>> wrote:
>> >
>> >
>> >> On 30/03/16 5:59 am, lloyd@REDACTED wrote:
>> >> So, I have a printed list of stop words:
>> >>
>> >> http://www.ranks.nl/stopwords
>> >>
>> >> I'd like to turn this list into an Erlang function that I can query---
>> >>
>> >> stopwords() ->
>> >>    ["word1", "word2" ... "wordN"].
>> >>
>> >> is_stopword(Word) ->
>> >>    List = stopwords(),
>> >>    lists_member(Word, List).
>> > Even if there is some arcane reason why you want the collection of words
>> > as a list, I strongly suggest generating
>> >
>> > is_stopword("a") -> true;
>> > is_stopword("about") -> true;
>> > ...
>> > is_stopword("yourselves") -> true;
>> > is_stopword(_) -> false.
>> >
>> > Open the list of stopwords in vi.
>> > :1,$s/^.*$/is_stopword("&") -> true;/
>> > :$a
>> > is_stopword(_) -> false.
>> > <ESC>
>> >
>> > The Erlang compiler will turn this into a trie, roughly speaking.
>> > This will be *dizzyingly* faster than the code you outlined.
>> >
>> >
>> >
>> >
>> >>
>> >> All my efforts so far have evolved into ugly kludges. Seems to me
>> there must be an elegant method that I'm overlooking.
>> >>
>> >> Some kind soul point the way?
>> >>
>> >> Many thanks,
>> >>
>> >> LRP
>> >>
>> >> *********************************************
>> >> My books:
>> >>
>> >> THE GOSPEL OF ASHES
>> >> http://thegospelofashes.com
>> >>
>> >> Strength is not enough. Do they have the courage
>> >> and the cunning? Can they survive long enough to
>> >> save the lives of millions?
>> >>
>> >> FREEIN' PANCHO
>> >> http://freeinpancho.com
>> >>
>> >> A community of misfits help a troubled boy find his way
>> >>
>> >> AYA TAKEO
>> >> http://ayatakeo.com
>> >>
>> >> Star-crossed love, war and power in an alternative
>> >> universe
>> >>
>> >> Available through Amazon or by request from your
>> >> favorite bookstore
>> >>
>> >>
>> >> **********************************************
>> >>
>> >> _______________________________________________
>> >> erlang-questions mailing list
>> >> erlang-questions@REDACTED
>> >> http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions@REDACTED
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160330/3f991444/attachment.htm>


More information about the erlang-questions mailing list