pcre, bifs, drivers and ports

Scott Lystig Fritchie fritchie@REDACTED
Mon Jul 31 22:00:55 CEST 2006


>>>>> "sh" == Sean Hinde <sean.hinde@REDACTED> writes:

sh> Mats' comment about limiting length of REs does
sh> not really cut it IMO. Blocking the whole emulator during a long
sh> regexp calculation rarely sounds like the right solution for
sh> typical Erlang apps.

One more thing to consider.  A *really* useful regexp library (or
any library that deals with strings) would be one that worked on:

    1. lists of byte values (the traditional Erlang "string")
    2. single binary terms
    3. "I/O lists", an arbitrarily deep list of #1 and/or #2.
       (Or #2 alone :-)

I would guess that that would come at a high cost implementaion, since
most C/C++ regexp packages operate on buffers of contiguous bytes, not
a string of bytes located in perhaps thousands of non-contiguous
places.

Oops, I forgot one:

    4. A possibly UNICODE/whatever internationalized "string" thingie
       stored in an I/O list.

As discussed on this list a few weeks ago, there is no agreement on
how to represent such a thing ... in Erlang or most other languages.

sh> But. It would be most fascinating to compare real world
sh> characteristics of:
sh> 1. BIF pcre 2. Driver pcre, 3. BIF pcre in SMT erlang.

Yup.

A linked-in driver can cheat even more if it can get a (internal C)
pointer to the term(s) it's operating on.  It's quite easy to create a
new BIF that returns the internal pointer/address of its argument and
return it as an integer.(*)  Turning that integer into a pointer, the
driver has full access to the term.  Use the power only for good.  :-)

-Scott

(*) It is a good, simple experiment if you've never tried writing a
BIF before.



More information about the erlang-questions mailing list