\p
and \P
are only supported in Unicode-aware mode. In Unicode-unaware mode, they are identity escapes for the p
or P
character.
Every Unicode character has a set of properties that describe it. For example, the character a
has the General_Category
property with value Lowercase_Letter
, and the Script
property with value Latn
. The \p
and \P
escape sequences allow you to match a character based on its properties. For example, a
can be matched by \p{Lowercase_Letter}
(the General_Category
property name is optional) as well as \p{Script=Latn}
. \P
creates a complement class that consists of code points without the specified property.
To compose multiple properties, use the character set intersection syntax enabled with the v
flag, or see pattern subtraction and intersection.
In v
mode, \p
may match a sequence of code points, defined in Unicode as "properties of strings". This is most useful for emojis, which are often composed of multiple code points. However, \P
can only complement character properties.
Note: There are plans to port the properties of strings feature to u
mode as well.