[erlang-questions] regexp is slow

Mon Nov 6 23:29:02 CET 2006

Ulf Wiger wrote:
> Den 2006-11-05 23:41:08 skrev Robert Virding <robert.virding@REDACTED>:
> 
>> Counting patterns was no problem, that went fast. It was the
>> substitution part that was taking most of the time. Not finding the
>> matching parts of the data but doing the actual substitutions. This is
>> one part of the code which much be improved, it was never considered
>> that it would process such large amounts of data. Having the data in a
>> binary would definitely NOT help here, it would result in an enormous
>> amount of copying.
> 
> 
> Well, the HiPE team's byte_array to the rescue then, eh?  ;-)

The problem is that any array representation is not right, you get a 
large amount of byte shifting which is costly. For example the first 
pass is a substitution which removes all newlines and some specific lines.

The problem is that the current implementation is overly naive ans 
results in excessive creation of lists.

Robert

P.S. Yes the first pass could be done in a better way but it is actually 
part of the problem, a pre-defined must be used to do this.