Substring look-up

zxq9 zxq9@REDACTED
Wed Apr 7 12:36:06 CEST 2021


On 2021/04/07 6:29, Olivier Boudeville wrote:
> Hi,
> 
> It must be a silly question, but, since the Latin1 -> Unicode switch in 
> OTP 20.0, is there a (non-obsolete) way in the string module to look-up 
> the index of a string into another one, i.e. to find the location of a 
> given substring?
> 
> rstr/2 is supposed to be replaced with find/3, yet the former returns an 
> index whereas the latter returns a part of the original string. I could 
> not find a way to obtain a relevant index with any of the newer string 
> functions - whereas I would guess it is a fairly common need?

The regex module's default run/2,3 behavior does what you are asking for.

   1> {ok, MP} = re:compile("foo", [unicode]).
   {ok,{re_pattern,0,1,0,<<69,82,,...>>}}
   2> re:run("barfoobar", MP).
   {match,[{3,3}]}
   3> re:run("barfoobarfoo", MP).
   {match,[{3,3}]}
   4> re:run("barfoobarfoo", MP, [global]).
   {match,[[{3,3}],[{9,3}]]}

Note here the [global] option makes it continue beyond the first match.

We are in a sort of flux at the moment with strings where we have 
finally got good unicode support and on a broader set of representations 
than just strings-as-lists but in the process of converting the string 
library module itself and revamping it a few rough edges and obsolete 
warnings still linger.

When all else fails, writing a custom function works great to cover the 
gap -- luckily none of these sort of functions are particularly 
difficult to figure out how to implement!

-Craig


More information about the erlang-questions mailing list