Parle pattern matching
Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], [:xdigit:].
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression [\xe2][\x82][\xac] can be used. The pattern for an UTF-8 encoded string could be [ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+.
Character representations
Character representations | Sequence |
Description |
| \a |
Alert (bell). |
| \b |
Backspace. |
| \e |
ESC character, \x1b. |
| \n |
Newline. |
| \r |
Carriage return. |
| \f |
Form feed, \x0c. |
| \t |
Horizontal tab, \x09. |
| \v |
Vertical tab, \x0b. |
| \oct |
Character specified by a three-digit octal code. |
| \xhex |
Character specified by a hex code. |
| \cchar |
Named control character. |
Character classes
Character classes | Sequence |
Description |
| [...] |
A single character listed or contained within a listed range. Ranges can be combined with the {+} and {-} operators. For example [a-z]{+}[0-9] is the same as [0-9a-z] and [a-z]{-}[aeiou] is the same as [b-df-hj-np-tv-z]. |
| [^...] |
A single character not listed and not contained within a listed range. |
| . |
Any character, default [^\n].
|
| \d |
Digit character, [0-9]. |
| \D |
Non-digit character, [^0-9]. |
| \s |
White space character, [ \t\n\r\f\v]. |
| \S |
Non-white space character, [^ \t\n\r\f\v]. |
| \w |
Word character, [a-zA-Z0-9_]. |
| \W |
Non-word character, [^a-zA-Z0-9_]. |
Unicode character classes
Unicode character classes | Sequence |
Description |
| \p{C} |
Other. |
| \p{Cc} |
Other, control. |
| \p{Cf} |
Other, format. |
| \p{Co} |
Other, private use. |
| \p{Cs} |
Other, surrogate. |
| \p{L} |
Letter. |
| \p{LC} |
Letter, cased. |
| \p{Ll} |
Letter, lowercase. |
| \p{Lm} |
Letter, modifier. |
| \p{Lo} |
Letter, other. |
| \p{Lt} |
Letter, titlecase. |
| \p{Lu} |
Letter, uppercase. |
| \p{M} |
Mark. |
| \p{Mc} |
Mark, space combining. |
| \p{Me} |
Mark, enclosing. |
| \p{Mn} |
Mark, nonspacing. |
| \p{N} |
Number. |
| \p{Nd} |
Number, decimal digit. |
| \p{Nl} |
Number, letter. |
| \p{No} |
Number, other. |
| \p{P} |
Punctuation. |
| \p{Pc} |
Punctiation, connector. |
| \p{Pd} |
Punctuation, dash. |
| \p{Pe} |
Punctuation, close. |
| \p{Pf} |
Punctuation, final quote. |
| \p{Pi} |
Punctuation, initial quote. |
| \p{Po} |
Punctuation, other. |
| \p{Ps} |
Punctuation, open. |
| \p{S} |
Symbol. |
| \p{Sc} |
Symbol, currency. |
| \p{Sk} |
Symbol, modifier. |
| \p{Sm} |
Symbol, math. |
| \p{So} |
Symbol, other. |
| \p{Z} |
Separator. |
| \p{Zl} |
Separator, line. |
| \p{Zp} |
Separator, paragraph. |
| \p{Zs} |
Separator, space. |
These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.
Alternation and repetition
Alternation and repetition | Sequence |
Greedy |
Description |
| ...|... |
- |
Try sub-patterns in alternation. |
| * |
yes |
Match 0 or more times. |
| + |
yes |
Match 1 or more times. |
| ? |
yes |
Match 0 or 1 times. |
| {n} |
no |
Match exactly n times. |
| {n,} |
yes |
Match at least n times. |
| {n,m} |
yes |
Match at least n times but no more than m times. |
| *? |
no |
Match 0 or more times. |
| +? |
no |
Match 1 or more times. |
| ?? |
no |
Match 0 or 1 times. |
| {n,}? |
no |
Match at least n times. |
| {n,m}? |
no |
Match at least n times but no more than m times. |
| {MACRO} |
- |
Include the regex MACRO in the current regex. |
Anchors
Anchors | Sequence |
Description |
| ^ |
Start of string or after a newline. |
| $ |
End of string or before a newline. |
Grouping
Grouping | Sequence | Description |
| (...) | Group a regular expression to override default operator precedence. |
| (?r-s:pattern) | Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x. i means case-insensitive. -i means case-sensitive. s alters the meaning of . to match any character whatsoever. -s alters the meaning of . to match any character except \n. x ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range. These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer. |
| (?# comment ) | Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines. |