[erlang-questions] regexp is slow
Sun Nov 5 23:41:08 CET 2006
I had a little look at this and did some very basic ecn-marking to see
where the time went.
Basically the problem is that you have a large file, ~100k, of DNA data
and you run some regexp on this to inspect and process the data. There
are two types operations:
1. global substitutions, regexp:gsub
2. counting occuring patterns regexp:matches
Counting patterns was no problem, that went fast. It was the
substitution part that was taking most of the time. Not finding the
matching parts of the data but doing the actual substitutions. This is
one part of the code which much be improved, it was never considered
that it would process such large amounts of data. Having the data in a
binary would definitely NOT help here, it would result in an enormous
amount of copying.
I did not measure the io bit. One thing at a time.
Mats Cronqvist wrote:
> somebody (virding?) was asking for a regexp benchmark.
> here's one where erlang is a 1000 or so times slower than java... much of
> that might be from the io of course.
> erlang-questions mailing list
More information about the erlang-questions