pcre, bifs, drivers and ports
Wed Aug 2 01:54:38 CEST 2006
If you write it in Erlang then this is not that difficult to do.
Scott Lystig Fritchie wrote:
>>>>>>"sh" == Sean Hinde <sean.hinde@REDACTED> writes:
> sh> Mats' comment about limiting length of REs does
> sh> not really cut it IMO. Blocking the whole emulator during a long
> sh> regexp calculation rarely sounds like the right solution for
> sh> typical Erlang apps.
> One more thing to consider. A *really* useful regexp library (or
> any library that deals with strings) would be one that worked on:
> 1. lists of byte values (the traditional Erlang "string")
> 2. single binary terms
> 3. "I/O lists", an arbitrarily deep list of #1 and/or #2.
> (Or #2 alone :-)
> I would guess that that would come at a high cost implementaion, since
> most C/C++ regexp packages operate on buffers of contiguous bytes, not
> a string of bytes located in perhaps thousands of non-contiguous
> Oops, I forgot one:
> 4. A possibly UNICODE/whatever internationalized "string" thingie
> stored in an I/O list.
> As discussed on this list a few weeks ago, there is no agreement on
> how to represent such a thing ... in Erlang or most other languages.
> sh> But. It would be most fascinating to compare real world
> sh> characteristics of:
> sh> 1. BIF pcre 2. Driver pcre, 3. BIF pcre in SMT erlang.
> A linked-in driver can cheat even more if it can get a (internal C)
> pointer to the term(s) it's operating on. It's quite easy to create a
> new BIF that returns the internal pointer/address of its argument and
> return it as an integer.(*) Turning that integer into a pointer, the
> driver has full access to the term. Use the power only for good. :-)
> (*) It is a good, simple experiment if you've never tried writing a
> BIF before.
More information about the erlang-questions