[erlang-questions] regexp is slow

Sun Nov 5 23:41:08 CET 2006

I had a little look at this and did some very basic ecn-marking to see 
where the time went.

Basically the problem is that you have a large file, ~100k, of DNA data 
and you run some regexp on this to inspect and process the data. There 
are two types operations:

1. global substitutions, regexp:gsub
2. counting occuring patterns regexp:matches

Counting patterns was no problem, that went fast. It was the 
substitution part that was taking most of the time. Not finding the 
matching parts of the data but doing the actual substitutions. This is 
one part of the code which much be improved, it was never considered 
that it would process such large amounts of data. Having the data in a 
binary would definitely NOT help here, it would result in an enormous 
amount of copying.

I did not measure the io bit. One thing at a time.

Robert

Mats Cronqvist wrote:
>    somebody (virding?) was asking for a regexp benchmark.
> 
>    here's one where erlang is a 1000 or so times slower than java... much of 
> that might be from the io of course.
> 
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=hipe&id=0
> 
>    mats
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions
>