pcre, bifs, drivers and ports
Robert Virding
robert.virding@REDACTED
Wed Aug 2 01:53:07 CEST 2006
That's interesting. During the summer I have been working on my Erlog
(Prolog in and for Erlang) interpreter. Instead of writing the tokeniser
by hand I use leex. Unfortunately you can't write the parser in yacc.
Now I intend to make leex fit for human consumption, the generated
scanner is fine but I need to clean up the code, and I was thinking that
it would not be difficult to add a compiler to regexp to make it faster.
either by gernerating a DFA which you interpret or a function which is
compiled.
I personally don't think that a bif is the way to go. A bif tends to
imply something which is part of the language, not just a "normal"
library function. This quite apart from the problems of bifs maybe
blocking the emulator. A port would be much cleaner.
A final point is that grep is not really a good name for the function.
It is said to mean "global regular expression print", and you are not
printing just trying to find a match. I hope. :-)
Robert
Mats Cronqvist wrote:
> Ernie Makris wrote:
>
>> Hello Erlangers,
>>
>> One thing I wanted to start a discussion on is getting pcre style
>> regexps in erlang. The question
>> I pose to the list is: What would be the best way to integrate the pcre
>> library into erlang.
>> The possible approaches I've seen so far are:
>> - create a linked in driver
>> - create a port program
>> - create new bifs (I'd really like this)
>
>
> a bif is the way to go, imo.
>
>> My obvious concerns are:
>> - How stable is the C pcre library for long running servers
>> - Stability implies:
>> - Memory leaks
>> - SIGSEGVs
>>
>> I would love the library calls in erlang to be bifs. Are there any
>> external examples, aside from just looking at the source that
>> demonstrate how to cleanly add a new bif?
>
>
> two weeks ago me and a colleague implemented two new bifs; re:grep/2
> and re:compile/1.
>
> documentation is a bit scarce;
>
> re:compile([RegExp]) -> [RegExpC]
> re:grep(Str,[RegExp]) -> [MatchItem]
> RegExp = string()
> RegExpC = term()
> MatchItem = no_match | {int(Beg),int(End),[string(SubMatch)]} |
> {error,{string(ErrorStr),int(ErrorChar)}}
>
> we have not yet observed any problems with stability.
>
> OTP has indicated that they will not introduce any bif that does not
> execute in bound time, or yields. i believe this can be met by limiting
> the length of the string and the regexp.
>
> if there is interest, we can probably make the code available.
>
> mats
>
More information about the erlang-questions
mailing list