pcre, bifs, drivers and ports

Robert Virding <>
Wed Aug 2 01:53:07 CEST 2006


That's interesting. During the summer I have been working on my Erlog 
(Prolog in and for Erlang) interpreter. Instead of writing the tokeniser 
by hand I use leex. Unfortunately you can't write the parser in yacc.

Now I intend to make leex fit for human consumption, the generated 
scanner is fine but I need to clean up the code, and I was thinking that 
it would not be difficult to add a compiler to regexp to make it faster. 
either by gernerating a DFA which you interpret or a function which is 
compiled.

I personally don't think that a bif is the way to go. A bif tends to 
imply something which is part of the language, not just a "normal" 
library function. This quite apart from the problems of bifs maybe 
blocking the emulator. A port would be much cleaner.

A final point is that grep is not really a good name for the function. 
It is said to mean "global regular expression print", and you are not 
printing just trying to find a match. I hope. :-)

Robert

Mats Cronqvist wrote:
> Ernie Makris wrote:
> 
>> Hello Erlangers,
>>
>> One thing I wanted to start a discussion on is getting pcre style
>> regexps in erlang. The question
>> I pose to the list is: What would be the best way to integrate the pcre
>> library into erlang.
>> The possible approaches I've seen so far are:
>> - create a linked in driver
>> - create a port program
>> - create new bifs (I'd really like this)
> 
> 
>   a bif is the way to go, imo.
> 
>> My obvious concerns are:
>> - How stable is the C pcre library for long running servers
>> - Stability implies:
>>     - Memory leaks
>>     - SIGSEGVs
>>
>> I would love the library calls in erlang to be bifs. Are there any
>> external examples, aside from just looking at the source that
>> demonstrate how to cleanly add a new bif?
> 
> 
>   two weeks ago me and a colleague implemented two new bifs; re:grep/2 
> and re:compile/1.
> 
>   documentation is a bit scarce;
> 
> re:compile([RegExp]) -> [RegExpC]
> re:grep(Str,[RegExp]) -> [MatchItem]
> RegExp = string()
> RegExpC = term()
> MatchItem = no_match | {int(Beg),int(End),[string(SubMatch)]} | 
> {error,{string(ErrorStr),int(ErrorChar)}}
> 
>   we have not yet observed any problems with stability.
> 
>   OTP has indicated that they will not introduce any bif that does not 
> execute in bound time, or yields. i believe this can be met by limiting 
> the length of the string and the regexp.
> 
>   if there is interest, we can probably make the code available.
> 
>   mats
> 



More information about the erlang-questions mailing list