[erlang-questions] any way to speed up regex.split?
Peer Stritzinger
peerst@REDACTED
Sun Dec 22 15:07:44 CET 2013
It looks like you run these benchmarks in the shell.
Code run in the shell is interpreted in Erlang and much slower than
compiled code.
Besides looping with a comprehension and building a large list might
also more overhead than necessary.
On 2013-12-22 12:29:43 +0000, Alexander Petrovsky said:
> Hi!
>
> I perform the same test on my machine:
>
> For Ruby:
>
> # irb
>
> require 'benchmark'
>
> n = 50000
>
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum."
> => "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
> voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
> sint occaecat cupidatat non proident, sunt in culpa qui officia
> deserunt mollit anim id est laborum."
>
> puts Benchmark.measure { n.times { text.split /\s+/ } }
>
> 3.850000 0.010000 3.860000 ( 3.955543)
>
> For Erlang:
>
> # erl
>
> N = lists:seq(1, 50000).
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum.">>.
>
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text,
> Pattern), ok end.
>
> F1 = fun() -> [F() || _ <- N] end.
>
> {T, _} = timer:tc(F1).
>
> T / 1000000.
>
> 13.7556
>
> # erl
>
> N = lists:seq(1, 50000).
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum.">>.
>
> {ok, Pattern} = re:compile("\\s+").
>
> F = fun() -> re:split(Text, Pattern), ok end.
>
> F1 = fun() -> [F() || _ <- N] end.
>
> {T, _} = timer:tc(F1).
>
> T / 1000000.
>
> 12.8033
>
> # erl -smp disable
>
> N = lists:seq(1, 50000).
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum.">>.
>
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text,
> Pattern), ok end.
>
> F1 = fun() -> [F() || _ <- N] end.
>
> {T, _} = timer:tc(F1).
>
> T / 1000000.
>
> 8.927621
>
> # erl -smp disable
>
> N = lists:seq(1, 50000).
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum.">>.
>
> {ok, Pattern} = re:compile("\\s+").
>
> F = fun() -> re:split(Text, Pattern), ok end.
>
> F1 = fun() -> [F() || _ <- N] end.
>
> {T, _} = timer:tc(F1).
>
> T / 1000000.
>
> 8.657157
>
>
> The ruby and erlang utilize my cpu for 100%. As you can see, I make
> some tricks, it make erlang a little bit faster, but it still not
> enough.
>
> If you need process lagre amount of text, you should use erlang:spawn
> for each text object, it will be very fast.
>
>
>
> 2013/12/18 akonsu <akonsu@REDACTED>
> I have two benchmarks that perform a simple text split on a regular
> expression. One is in Ruby and another is in Erlang. The Erlang version
> is 6 times slower on my machine for some reason. I have read all
> documentation I could find on how to use binaries in Erlang, but I
> cannot make it faster. I am looking for help.
>
>
>
> Here is the code:
>
> Ruby:
>
> require 'benchmark'
>
> n = 50000
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed
> do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
> ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
> aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
> in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui
> officia deserunt mollit anim id est laborum."
>
> puts Benchmark.measure {
> n.times { text.split /\s+/ }
> }
>
>
> Erlang:
>
> text() ->
> <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
> voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
> sint occaecat cupidatat non proident, sunt in culpa qui officia
> deserunt mollit anim id est laborum.">>.
>
> times(0, _) ->
> ok;
> times(N, F) ->
> F(),
> times(N - 1, F).
>
> measure(N) ->
> {ok, Pattern} = re:compile("\\s+"),
> B = text(),
> F = fun() -> re:split(B, Pattern) end,
> {T, ok} = timer:tc(?MODULE, times, [N, F]),
> T / 1000000.
>
> Ruby outputs
>
> 3.180000 0.000000 3.180000 ( 3.182452)
>
> Erlang outputs
>
> Erlang R16B03 (erts-5.10.4) [source] [smp:2:2] [async-threads:10]
> [hipe] [kernel-poll:false]
>
> Eshell V5.10.4 (abort with ^G)
> 1> test:measure(50000).
> 18.261952
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
> --
> Петровский Александр / Alexander Petrovsky,
>
> Skype: askjuise
> Jabber: juise@REDACTED
> Phone: +7 914 8 820 815 (irkutsk)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131222/57c36ec0/attachment.htm>
More information about the erlang-questions
mailing list