[erlang-questions] Printed list to Erlang function

Sverker Eriksson <>
Thu Mar 31 21:31:44 CEST 2016


Small maps uses an array with linear search for key.

Large maps (> 32 keys) uses a HAMT tree with a maximum branching factor 
of 16.

https://en.wikipedia.org/wiki/Hash_array_mapped_trie


/Sverker, Erlang/OTP


On 03/30/2016 10:11 PM, Hynek Vychodil wrote:
> I had same intuition so why I tried maps first. If you know how it is 
> implemented (You can look at compiled bytecode) and know how BEAM 
> works it shows how well is BEAM written or there is something fishy in 
> maps implementation. (It's 18.3.)
>
> Pichi
>
> On Wed, Mar 30, 2016 at 8:23 PM, Jason Douglas < 
> <mailto:>> wrote:
>
>     At very large sizes though, wouldn’t the map (with O(1) access)
>     become faster than the trie built by the compiler (which will be
>     proportional to lookup term length, when sufficiently dense)? I
>     used to deal a lot with pre-compiled tries and hash tables in C++,
>     and while I loved tries for space efficiency, a hash table always
>     had the upper hand in performance.
>
>     I can see though why for a small set like this, a carefully tuned
>     implementation like Erlang’s would beat out the map.
>
>     Jason
>
>>     On Mar 30, 2016, at 11:12 AM, Hynek Vychodil
>>     < <mailto:>> wrote:
>>
>>     Every time I read a claim about how fast it will be I have urge
>>     test it. I had an idea that constant map in a module could be
>>     faster than function clause co I test it.
>>
>>     I was wrong and RAO is right as usual. Function using function
>>     clause seems to be three times faster than using map.
>>
>>     x clause
>>     + map
>>     +--------------------------------------------------------------------------+
>>     |xxxxx                     +++++          +|
>>     |xxxx                    ++++            |
>>     |xxxx                    +++             |
>>     |xxxx                     ++             |
>>     |xxx                      ++             |
>>     |xxx                      ++             |
>>     |xx                     ++             |
>>     |xx                     ++             |
>>     |xx                     ++             |
>>     |xx                     +              |
>>     |xx                     +              |
>>     |xx                     +              |
>>     |xx                     +              |
>>     |xx                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     | x                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     |                     +              |
>>     ||A|                                     |
>>     |                   |_MA_|           |
>>     +--------------------------------------------------------------------------+
>>     Dataset: x N=50 CI=95.0000
>>     Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
>>     Min:            3490.00
>>     1st Qu.         3551.00
>>     Median:         3591.00
>>     3rd Qu.         3679.00
>>     Max:            3945.00
>>     Average:        3630.16 [     0.137534] (      3602.82 ‥      
>>     3664.56)
>>     Std. Dev:       113.400 [     -1.81311] (      90.8425 ‥      
>>     141.539)
>>
>>     Outliers: 0/4 = 4 (μ=3630.30, σ=111.587)
>>             Outlier variance:      0.151802 (moderate)
>>
>>     ------
>>
>>     Dataset: + N=50 CI=95.0000
>>     Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
>>     Min:         1.09500e+4
>>     1st Qu.      1.10160e+4
>>     Median:      1.10400e+4
>>     3rd Qu.      1.11270e+4
>>     Max:         1.28270e+4
>>     Average:     1.11055e+4 [     0.297998] (   1.10611e+4 ‥  
>>      1.12491e+4)
>>     Std. Dev:       264.914 [     -31.0673] (      84.7956 ‥      
>>     582.629)
>>
>>     Outliers: 0/2 = 2 (μ=1.11058e+4, σ=233.847)
>>             Outlier variance:    9.45082e-2 (slight)
>>
>>     Difference at 95.0% confidence
>>             7475.36 ± 80.8533
>>             205.924% ± 2.22726%
>>             (Student's t, pooled s = 203.763)
>>     ------
>>
>>     It's about 31 million stopwords_clause:is_stopword/1 per second
>>     and 10 million stopwords_map:is_stopword/1 per second.
>>
>>     You can find code in gist
>>     https://gist.github.com/pichi/2d10c93242d5057913d026a607f07dd4
>>
>>     Pichi
>>
>>     On Wed, Mar 30, 2016 at 4:05 AM, Lloyd R. Prentice
>>     < <mailto:>> wrote:
>>
>>         Wow! What a cool idea.
>>
>>         Thanks, Richard.
>>
>>         Best wishes,
>>
>>         LRP
>>
>>         Sent from my iPad
>>
>>         > On Mar 29, 2016, at 8:47 PM, "Richard A. O'Keefe"
>>         < <mailto:>> wrote:
>>         >
>>         >
>>         >> On 30/03/16 5:59 am, 
>>         <mailto:> wrote:
>>         >> So, I have a printed list of stop words:
>>         >>
>>         >> http://www.ranks.nl/stopwords
>>         >>
>>         >> I'd like to turn this list into an Erlang function that I
>>         can query---
>>         >>
>>         >> stopwords() ->
>>         >>    ["word1", "word2" ... "wordN"].
>>         >>
>>         >> is_stopword(Word) ->
>>         >>    List = stopwords(),
>>         >>    lists_member(Word, List).
>>         > Even if there is some arcane reason why you want the
>>         collection of words
>>         > as a list, I strongly suggest generating
>>         >
>>         > is_stopword("a") -> true;
>>         > is_stopword("about") -> true;
>>         > ...
>>         > is_stopword("yourselves") -> true;
>>         > is_stopword(_) -> false.
>>         >
>>         > Open the list of stopwords in vi.
>>         > :1,$s/^.*$/is_stopword("&") -> true;/
>>         > :$a
>>         > is_stopword(_) -> false.
>>         > <ESC>
>>         >
>>         > The Erlang compiler will turn this into a trie, roughly
>>         speaking.
>>         > This will be *dizzyingly* faster than the code you outlined.
>>         >
>>         >
>>         >
>>         >
>>         >>
>>         >> All my efforts so far have evolved into ugly kludges.
>>         Seems to me there must be an elegant method that I'm overlooking.
>>         >>
>>         >> Some kind soul point the way?
>>         >>
>>         >> Many thanks,
>>         >>
>>         >> LRP
>>         >>
>>         >> *********************************************
>>         >> My books:
>>         >>
>>         >> THE GOSPEL OF ASHES
>>         >> http://thegospelofashes.com <http://thegospelofashes.com/>
>>         >>
>>         >> Strength is not enough. Do they have the courage
>>         >> and the cunning? Can they survive long enough to
>>         >> save the lives of millions?
>>         >>
>>         >> FREEIN' PANCHO
>>         >> http://freeinpancho.com <http://freeinpancho.com/>
>>         >>
>>         >> A community of misfits help a troubled boy find his way
>>         >>
>>         >> AYA TAKEO
>>         >> http://ayatakeo.com <http://ayatakeo.com/>
>>         >>
>>         >> Star-crossed love, war and power in an alternative
>>         >> universe
>>         >>
>>         >> Available through Amazon or by request from your
>>         >> favorite bookstore
>>         >>
>>         >>
>>         >> **********************************************
>>         >>
>>         >> _______________________________________________
>>         >> erlang-questions mailing list
>>         >> 
>>         <mailto:>
>>         >> http://erlang.org/mailman/listinfo/erlang-questions
>>         >
>>         _______________________________________________
>>         erlang-questions mailing list
>>          <mailto:>
>>         http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>     _______________________________________________
>>     erlang-questions mailing list
>>      <mailto:>
>>     http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20160331/a4d1c592/attachment.html>


More information about the erlang-questions mailing list