pcre, bifs, drivers and ports
Scott Lystig Fritchie
fritchie@REDACTED
Mon Jul 31 22:00:55 CEST 2006
>>>>> "sh" == Sean Hinde <sean.hinde@REDACTED> writes:
sh> Mats' comment about limiting length of REs does
sh> not really cut it IMO. Blocking the whole emulator during a long
sh> regexp calculation rarely sounds like the right solution for
sh> typical Erlang apps.
One more thing to consider. A *really* useful regexp library (or
any library that deals with strings) would be one that worked on:
1. lists of byte values (the traditional Erlang "string")
2. single binary terms
3. "I/O lists", an arbitrarily deep list of #1 and/or #2.
(Or #2 alone :-)
I would guess that that would come at a high cost implementaion, since
most C/C++ regexp packages operate on buffers of contiguous bytes, not
a string of bytes located in perhaps thousands of non-contiguous
places.
Oops, I forgot one:
4. A possibly UNICODE/whatever internationalized "string" thingie
stored in an I/O list.
As discussed on this list a few weeks ago, there is no agreement on
how to represent such a thing ... in Erlang or most other languages.
sh> But. It would be most fascinating to compare real world
sh> characteristics of:
sh> 1. BIF pcre 2. Driver pcre, 3. BIF pcre in SMT erlang.
Yup.
A linked-in driver can cheat even more if it can get a (internal C)
pointer to the term(s) it's operating on. It's quite easy to create a
new BIF that returns the internal pointer/address of its argument and
return it as an integer.(*) Turning that integer into a pointer, the
driver has full access to the term. Use the power only for good. :-)
-Scott
(*) It is a good, simple experiment if you've never tried writing a
BIF before.
More information about the erlang-questions
mailing list