pcre, bifs, drivers and ports 
    Scott Lystig Fritchie 
    fritchie@REDACTED
       
    Mon Jul 31 22:00:55 CEST 2006
    
    
  
>>>>> "sh" == Sean Hinde <sean.hinde@REDACTED> writes:
sh> Mats' comment about limiting length of REs does
sh> not really cut it IMO. Blocking the whole emulator during a long
sh> regexp calculation rarely sounds like the right solution for
sh> typical Erlang apps.
One more thing to consider.  A *really* useful regexp library (or
any library that deals with strings) would be one that worked on:
    1. lists of byte values (the traditional Erlang "string")
    2. single binary terms
    3. "I/O lists", an arbitrarily deep list of #1 and/or #2.
       (Or #2 alone :-)
I would guess that that would come at a high cost implementaion, since
most C/C++ regexp packages operate on buffers of contiguous bytes, not
a string of bytes located in perhaps thousands of non-contiguous
places.
Oops, I forgot one:
    4. A possibly UNICODE/whatever internationalized "string" thingie
       stored in an I/O list.
As discussed on this list a few weeks ago, there is no agreement on
how to represent such a thing ... in Erlang or most other languages.
sh> But. It would be most fascinating to compare real world
sh> characteristics of:
sh> 1. BIF pcre 2. Driver pcre, 3. BIF pcre in SMT erlang.
Yup.
A linked-in driver can cheat even more if it can get a (internal C)
pointer to the term(s) it's operating on.  It's quite easy to create a
new BIF that returns the internal pointer/address of its argument and
return it as an integer.(*)  Turning that integer into a pointer, the
driver has full access to the term.  Use the power only for good.  :-)
-Scott
(*) It is a good, simple experiment if you've never tried writing a
BIF before.
    
    
More information about the erlang-questions
mailing list