[erlang-questions] any way to speed up regex.split?

Alexander Petrovsky askjuise@REDACTED
Sun Dec 22 13:29:43 CET 2013


Hi!

I perform the same test on my machine:

*For Ruby:*

*# irb*

require 'benchmark'

n = 50000

text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum."
=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum."

puts Benchmark.measure { n.times { text.split /\s+/ } }

3.850000   0.010000   3.860000 (  *3.955543*)

*For Erlang:*

*# erl*

N = lists:seq(1, 50000).

Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.">>.

F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, Pattern),
ok end.

F1 = fun() -> [F() || _ <- N] end.

{T, _} = timer:tc(F1).

T / 1000000.


*13.7556*

*# erl*

N = lists:seq(1, 50000).

Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.">>.

{ok, Pattern} = re:compile("\\s+").

F = fun() -> re:split(Text, Pattern), ok end.

F1 = fun() -> [F() || _ <- N] end.

{T, _} = timer:tc(F1).

T / 1000000.


*12.8033*

*# erl -smp disable*

N = lists:seq(1, 50000).

Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.">>.

F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, Pattern),
ok end.

F1 = fun() -> [F() || _ <- N] end.

{T, _} = timer:tc(F1).

T / 1000000.


*8.927621*

*# erl -smp disable*

N = lists:seq(1, 50000).

Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.">>.

{ok, Pattern} = re:compile("\\s+").

F = fun() -> re:split(Text, Pattern), ok end.

F1 = fun() -> [F() || _ <- N] end.

{T, _} = timer:tc(F1).

T / 1000000.

8.657157


The ruby and erlang utilize my cpu for 100%. As you can see, I make some
tricks, it make erlang a little bit faster, but it still not enough.

If you need process lagre amount of text, you should use erlang:spawn for
each text object, it will be very fast.



2013/12/18 akonsu <akonsu@REDACTED>

> I have two benchmarks that perform a simple text split on a regular
> expression. One is in Ruby and another is in Erlang. The Erlang version is
> 6 times slower on my machine for some reason. I have read all documentation
> I could find on how to use binaries in Erlang, but I cannot make it faster.
> I am looking for help.
>
>
>
> Here is the code:
>
> Ruby:
>
> require 'benchmark'
>
> n = 50000
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum."
>
> puts Benchmark.measure {
>   n.times { text.split /\s+/ }
> }
>
>
> Erlang:
>
> text() ->
>     <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
> times(0, _) ->
>     ok;
> times(N, F) ->
>     F(),
>     times(N - 1, F).
>
> measure(N) ->
>     {ok, Pattern} = re:compile("\\s+"),
>     B = text(),
>     F = fun() -> re:split(B, Pattern) end,
>     {T, ok} = timer:tc(?MODULE, times, [N, F]),
>     T / 1000000.
>
> Ruby outputs
>
>   3.180000   0.000000   3.180000 (  3.182452)
>
> Erlang outputs
>
> Erlang R16B03 (erts-5.10.4) [source] [smp:2:2] [async-threads:10] [hipe]
> [kernel-poll:false]
>
> Eshell V5.10.4  (abort with ^G)
> 1> test:measure(50000).
> 18.261952
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
>
>


-- 
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Jabber: juise@REDACTED
Phone: +7 914 8 820 815 (irkutsk)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131222/3fbd943d/attachment.htm>


More information about the erlang-questions mailing list