[erlang-questions] regexp is slow

Thomas Lindgren <>
Mon Nov 6 10:47:02 CET 2006



--- Robert Virding <> wrote:

> 2. counting occuring patterns regexp:matches
> 
> Counting patterns was no problem, that went fast. It
> was the 
> substitution part that was taking most of the time.
> Not finding the 
> matching parts of the data but doing the actual
> substitutions. This is 
> one part of the code which much be improved, it was
> never considered 
> that it would process such large amounts of data.
> Having the data in a 
> binary would definitely NOT help here, it would
> result in an enormous 
> amount of copying.

i haven't looked at the precise problem, but would it
help to return a list of binaries instead? Cut out the
match and put in the substitute instead:

  [...,
  <<"before match">>, 
  <<"substitution">>, 
  <<"between matches">>, 
  <<"next subst">>,
  ...
  ]

In the larger scheme of things, it might be nice to
have some way to stream large binaries (aka map, fold,
...) more transparently. Maybe Jay Nelson's paper at
the 2005 workshop could be a starting point?

Best,
Thomas




 
____________________________________________________________________________________
Everyone is raving about the all-new Yahoo! Mail 
(http://advision.webevents.yahoo.com/mailbeta/)




More information about the erlang-questions mailing list