[erlang-questions] any way to speed up regex.split?

Peer Stritzinger peerst@REDACTED
Sun Dec 22 15:07:44 CET 2013


It looks like you run these benchmarks in the shell.

Code run in the shell is interpreted in Erlang and much slower than 
compiled code.

Besides looping with a comprehension and building a large list might 
also more overhead than necessary.

On 2013-12-22 12:29:43 +0000, Alexander Petrovsky said:

> Hi!
> 
> I perform the same test on my machine:
> 
> For Ruby:
> 
> # irb
> 
> require 'benchmark'
> 
> n = 50000
> 
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum."
> => "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do 
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad 
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip 
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit in 
> voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur 
> sint occaecat cupidatat non proident, sunt in culpa qui officia 
> deserunt mollit anim id est laborum."
> 
> puts Benchmark.measure { n.times { text.split /\s+/ } }
> 
> 3.850000   0.010000   3.860000 (  3.955543)
> 
> For Erlang:
> 
> # erl
> 
> N = lists:seq(1, 50000).
> 
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum.">>.
> 
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, 
> Pattern), ok end.
> 
> F1 = fun() -> [F() || _ <- N] end.
> 
> {T, _} = timer:tc(F1).
> 
> T / 1000000.
> 
> 13.7556
> 
> # erl
> 
> N = lists:seq(1, 50000).
> 
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum.">>.
> 
> {ok, Pattern} = re:compile("\\s+").
> 
> F = fun() -> re:split(Text, Pattern), ok end.
> 
> F1 = fun() -> [F() || _ <- N] end.
> 
> {T, _} = timer:tc(F1).
> 
> T / 1000000.
> 
> 12.8033
> 
> # erl -smp disable
> 
> N = lists:seq(1, 50000).
> 
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum.">>.
> 
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, 
> Pattern), ok end.
> 
> F1 = fun() -> [F() || _ <- N] end.
> 
> {T, _} = timer:tc(F1).
> 
> T / 1000000.
> 
> 8.927621
> 
> # erl -smp disable
> 
> N = lists:seq(1, 50000).
> 
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum.">>.
> 
> {ok, Pattern} = re:compile("\\s+").
> 
> F = fun() -> re:split(Text, Pattern), ok end.
> 
> F1 = fun() -> [F() || _ <- N] end.
> 
> {T, _} = timer:tc(F1).
> 
> T / 1000000.
> 
> 8.657157
> 
> 
> The ruby and erlang utilize my cpu for 100%. As you can see, I make 
> some tricks, it make erlang a little bit faster, but it still not 
> enough.
> 
> If you need process lagre amount of text, you should use erlang:spawn 
> for each text object, it will be very fast.
> 
> 
> 
> 2013/12/18 akonsu <akonsu@REDACTED>
> I have two benchmarks that perform a simple text split on a regular 
> expression. One is in Ruby and another is in Erlang. The Erlang version 
> is 6 times slower on my machine for some reason. I have read all 
> documentation I could find on how to use binaries in Erlang, but I 
> cannot make it faster. I am looking for help.
> 
> 
> 
> Here is the code:
> 
> Ruby:
> 
> require 'benchmark'
> 
> n = 50000
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed 
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim 
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut 
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit 
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui 
> officia deserunt mollit anim id est laborum."
> 
> puts Benchmark.measure {
>   n.times { text.split /\s+/ }
> }
> 
> 
> Erlang:
> 
> text() ->
>     <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do 
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad 
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip 
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit in 
> voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur 
> sint occaecat cupidatat non proident, sunt in culpa qui officia 
> deserunt mollit anim id est laborum.">>.
> 
> times(0, _) ->
>     ok;
> times(N, F) ->
>     F(),
>     times(N - 1, F).
> 
> measure(N) ->
>     {ok, Pattern} = re:compile("\\s+"),
>     B = text(),
>     F = fun() -> re:split(B, Pattern) end,
>     {T, ok} = timer:tc(?MODULE, times, [N, F]),
>     T / 1000000.
> 
> Ruby outputs
> 
>   3.180000   0.000000   3.180000 (  3.182452)
> 
> Erlang outputs
> 
> Erlang R16B03 (erts-5.10.4) [source] [smp:2:2] [async-threads:10] 
> [hipe] [kernel-poll:false]
> 
> Eshell V5.10.4  (abort with ^G)
> 1> test:measure(50000).
> 18.261952
> 
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
> 
> 
> 
> 
> -- 
> Петровский Александр / Alexander Petrovsky,
> 
> Skype: askjuise
> Jabber: juise@REDACTED
> Phone: +7 914 8 820 815 (irkutsk)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131222/57c36ec0/attachment.htm>


More information about the erlang-questions mailing list