# PCRE2 Migration Starting with Erlang/OTP 28, the underlying library for handling regular expressions via the `re` module changes from PCRE to PCRE2. This upgrade brings Erlang's regular expression capabilities more in line with modern standards, particularly Perl, but also introduces several breaking changes and differences in behavior compared to PCRE. A key philosophical difference is that **PCRE2 is much stricter about pattern syntax**. Invalid constructs that PCRE might have ignored or treated as literal characters will now typically raise compilation errors, similar to using Perl in strict mode. Below is a summary of notable incompatibilities and behavioral changes: ## Stricter Error Handling & Syntax Validation * **Invalid Escapes:** Undocumented escape sequences (for example, `\M`, `\i`) are now treated as errors. PCRE often treated these as literal characters (for example, `\M` became `M`). * **Invalid Escapes in Character Classes:** Using sequences that are not valid *within* a character class (for example, anchors like `\B`, sequences like `\R` or `\X`) is now an error (for example, `[\B]`). PCRE might have treated these literally. * **Invalid Character Ranges:** Ranges where the start point is logically after the end point, or involving incompatible types (for example, `[\d-a]`), are now errors. PCRE might have interpreted this as matching a digit, a hyphen, or 'a' literally. * **Invalid Backreferences:** Using a backreference to a non-existent capturing group (for example, `\8` when only 7 groups exist) is now treated as an error. * **Invalid Control Characters (`\cx`):** The `\c` escape must be followed by a character that maps to a valid control character (typically ASCII characters `@` through `_`, corresponding to `\x00` through `\x1F`, and `?` for `\x7F`). Using characters in the range 127-255 will result in an error. ## Syntax Changes & Requirements * **`\x` Requires Hex Digits:** The `\x` escape now *must* be followed by hexadecimal digits. Use `\xNN` (one or two digits) or `\x{HHHH}` (variable number of digits in braces). Using `\x` alone is an error. * **`\N` (Match Non-Newline) in Character Classes:** The shorthand `\N` is not allowed directly within a character class (for example, `[\N]` is invalid). However, the named Unicode sequence `\N{U+...}` *is* allowed (for example, `[\N{U+0041}]` to match 'A'). * **Empty Group Names:** Defining capturing groups with empty names using `(?''...)` syntax is no longer supported and will cause an error. ## Option Handling * **Compile-Time vs. Run-Time Options:** Options affecting newline conventions (`{newline, _}`) or backslash R behavior (`bsr_anycrlf`, `bsr_unicode`) only control pattern *compilation*. If a pattern is pre-compiled using `re:compile/2`, passing incompatible options for these settings later to `re:run/3`, `re:replace/4`, or `re:split/3` will result in an error. If present, the options must match those used at compile time. ## Behavioral Changes & Feature Restrictions * **`\K` in Lookarounds:** The `\K` escape (reset match start) cannot be used inside lookahead `(?=...)`, `(?!...)` or lookbehind `(?<=...)`, `(?