[erlang-questions] any way to speed up regex.split?

Alexander Petrovsky <>
Sun Dec 22 15:31:16 CET 2013


No, really I compile my code to beam file and then run from shell.


2013/12/22 Peer Stritzinger <>

>  It looks like you run these benchmarks in the shell.
>
>
> Code run in the shell is interpreted in Erlang and much slower than
> compiled code.
>
>
> Besides looping with a comprehension and building a large list might also
> more overhead than necessary.
>
>
> On 2013-12-22 12:29:43 +0000, Alexander Petrovsky said:
>
>
> Hi!
>
>
> I perform the same test on my machine:
>
>
> For Ruby:
>
>
> # irb
>
>
> require 'benchmark'
>
>
> n = 50000
>
>
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum."
>
> => "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum."
>
>
> puts Benchmark.measure { n.times { text.split /\s+/ } }
>
>
> 3.850000   0.010000   3.860000 (  3.955543)
>
>
> For Erlang:
>
>
> # erl
>
>
> N = lists:seq(1, 50000).
>
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
>
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, Pattern),
> ok end.
>
>
> F1 = fun() -> [F() || _ <- N] end.
>
>
> {T, _} = timer:tc(F1).
>
>
> T / 1000000.
>
>
> 13.7556
>
>
> # erl
>
>
> N = lists:seq(1, 50000).
>
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
>
> {ok, Pattern} = re:compile("\\s+").
>
>
> F = fun() -> re:split(Text, Pattern), ok end.
>
>
> F1 = fun() -> [F() || _ <- N] end.
>
>
> {T, _} = timer:tc(F1).
>
>
> T / 1000000.
>
>
> 12.8033
>
>
> # erl -smp disable
>
>
> N = lists:seq(1, 50000).
>
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
>
> F = fun() -> {ok, Pattern} = re:compile("\\s+"), re:split(Text, Pattern),
> ok end.
>
>
> F1 = fun() -> [F() || _ <- N] end.
>
>
> {T, _} = timer:tc(F1).
>
>
> T / 1000000.
>
>
> 8.927621
>
>
> # erl -smp disable
>
>
> N = lists:seq(1, 50000).
>
>
> Text = <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
>
> {ok, Pattern} = re:compile("\\s+").
>
>
> F = fun() -> re:split(Text, Pattern), ok end.
>
>
> F1 = fun() -> [F() || _ <- N] end.
>
>
> {T, _} = timer:tc(F1).
>
>
> T / 1000000.
>
>
> 8.657157
>
>
>
> The ruby and erlang utilize my cpu for 100%. As you can see, I make some
> tricks, it make erlang a little bit faster, but it still not enough.
>
>
> If you need process lagre amount of text, you should use erlang:spawn for
> each text object, it will be very fast.
>
>
>
>
> 2013/12/18 akonsu <>
>
> I have two benchmarks that perform a simple text split on a regular
> expression. One is in Ruby and another is in Erlang. The Erlang version is
> 6 times slower on my machine for some reason. I have read all documentation
> I could find on how to use binaries in Erlang, but I cannot make it faster.
> I am looking for help.
>
>
>
>
> Here is the code:
>
>
> Ruby:
>
>
> require 'benchmark'
>
>
> n = 50000
>
> text = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum."
>
>
> puts Benchmark.measure {
>
>   n.times { text.split /\s+/ }
>
> }
>
>
>
> Erlang:
>
>
> text() ->
>
>     <<"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex
> ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
> velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
> cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
> est laborum.">>.
>
>
> times(0, _) ->
>
>     ok;
>
> times(N, F) ->
>
>     F(),
>
>     times(N - 1, F).
>
>
> measure(N) ->
>
>     {ok, Pattern} = re:compile("\\s+"),
>
>     B = text(),
>
>     F = fun() -> re:split(B, Pattern) end,
>
>     {T, ok} = timer:tc(?MODULE, times, [N, F]),
>
>     T / 1000000.
>
>
> Ruby outputs
>
>
>   3.180000   0.000000   3.180000 (  3.182452)
>
>
> Erlang outputs
>
>
> Erlang R16B03 (erts-5.10.4) [source] [smp:2:2] [async-threads:10] [hipe]
> [kernel-poll:false]
>
>
> Eshell V5.10.4  (abort with ^G)
>
> 1> test:measure(50000).
>
> 18.261952
>
>
>
> _______________________________________________
>
> erlang-questions mailing list
>
> 
>
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
>
> --
>
> Петровский Александр / Alexander Petrovsky,
>
>
> Skype: askjuise
>
> Jabber: 
>
> Phone: +7 914 8 820 815 (irkutsk)
>
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>
>


-- 
Петровский Александр / Alexander Petrovsky,

Skype: askjuise
Jabber: 
Phone: +7 914 8 820 815 (irkutsk)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20131222/df545f58/attachment.html>


More information about the erlang-questions mailing list