The various forms in rx
regexps are described below. The shorthand rx represents any rx
form, and rx… means zero or more rx
forms. Where the corresponding string regexp syntax is given, A, B, … are string regexp subexpressions.
"some-string"
Match the string ‘some-string’ literally. There are no characters with special meaning, unlike in string regexps.
?C
Match the character ‘C’ literally.
(seq rx…)
(sequence rx…)
(: rx…)
(and rx…)
Match the rxs in sequence. Without arguments, the expression matches the empty string. Corresponding string regexp: ‘AB…’ (subexpressions in sequence).
(or rx…)
(| rx…)
Match exactly one of the rxs. If all arguments are strings, characters, or or
forms so constrained, the longest possible match will always be used. Otherwise, either the longest match or the first (in left-to-right order) will be used. Without arguments, the expression will not match anything at all. Corresponding string regexp: ‘A\|B\|…’.
unmatchable
Refuse any match. Equivalent to (or)
. See regexp-unmatchable.
Normally, repetition forms are greedy, in that they attempt to match as many times as possible. Some forms are non-greedy; they try to match as few times as possible (see Non-greedy repetition).
(zero-or-more rx…)
(0+ rx…)
Match the rxs zero or more times. Greedy by default. Corresponding string regexp: ‘A*’ (greedy), ‘A*?’ (non-greedy)
(one-or-more rx…)
(1+ rx…)
Match the rxs one or more times. Greedy by default. Corresponding string regexp: ‘A+’ (greedy), ‘A+?’ (non-greedy)
(zero-or-one rx…)
(optional rx…)
(opt rx…)
Match the rxs once or an empty string. Greedy by default. Corresponding string regexp: ‘A?’ (greedy), ‘A??’ (non-greedy).
(* rx…)
Match the rxs zero or more times. Greedy. Corresponding string regexp: ‘A*’
(+ rx…)
Match the rxs one or more times. Greedy. Corresponding string regexp: ‘A+’
(? rx…)
Match the rxs once or an empty string. Greedy. Corresponding string regexp: ‘A?’
(*? rx…)
Match the rxs zero or more times. Non-greedy. Corresponding string regexp: ‘A*?’
(+? rx…)
Match the rxs one or more times. Non-greedy. Corresponding string regexp: ‘A+?’
(?? rx…)
Match the rxs or an empty string. Non-greedy. Corresponding string regexp: ‘A??’
(= n rx…)
(repeat n rx)
Match the rxs exactly n times. Corresponding string regexp: ‘A\{n\}’
(>= n rx…)
Match the rxs n or more times. Greedy. Corresponding string regexp: ‘A\{n,\}’
(** n m rx…)
(repeat n m rx…)
Match the rxs at least n but no more than m times. Greedy. Corresponding string regexp: ‘A\{n,m\}’
The greediness of some repetition forms can be controlled using the following constructs. However, it is usually better to use the explicit non-greedy forms above when such matching is required.
(minimal-match rx)
Match rx, with zero-or-more
, 0+
, one-or-more
, 1+
, zero-or-one
, opt
and optional
using non-greedy matching.
(maximal-match rx)
Match rx, with zero-or-more
, 0+
, one-or-more
, 1+
, zero-or-one
, opt
and optional
using greedy matching. This is the default.
(any set…)
(char set…)
(in set…)
Match a single character from one of the sets. Each set is a character, a string representing the set of its characters, a range or a character class (see below). A range is either a hyphen-separated string like "A-Z"
, or a cons of characters like (?A . ?Z)
.
Note that hyphen (-
) is special in strings in this construct, since it acts as a range separator. To include a hyphen, add it as a separate character or single-character string. Corresponding string regexp: ‘[…]’
(not charspec)
Match a character not included in charspec. charspec can be a character, a single-character string, an any
, not
, or
, intersection
, syntax
or category
form, or a character class. If charspec is an or
form, its arguments have the same restrictions as those of intersection
; see below. Corresponding string regexp: ‘[^…]’, ‘\Scode’, ‘\Ccode’
(intersection charset…)
Match a character included in all of the charsets. Each charset can be a character, a single-character string, an any
form without character classes, or an intersection
, or
or not
form whose arguments are also charsets.
not-newline
, nonl
Match any character except a newline. Corresponding string regexp: ‘.’ (dot)
anychar
, anything
Match any character. Corresponding string regexp: ‘.\|\n’ (for example)
Match a character from a named character class:
alpha
, alphabetic
, letter
Match alphabetic characters. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic.
alnum
, alphanumeric
Match alphabetic characters and digits. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic or decimal digits.
digit
, numeric
, num
Match the digits ‘0’–‘9’.
xdigit
, hex-digit
, hex
Match the hexadecimal digits ‘0’–‘9’, ‘A’–‘F’ and ‘a’–‘f’.
cntrl
, control
Match any character whose code is in the range 0–31.
blank
Match horizontal whitespace. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are spacing separators.
space
, whitespace
, white
Match any character that has whitespace syntax (see Syntax Class Table).
lower
, lower-case
Match anything lower-case, as determined by the current case table. If case-fold-search
is non-nil, this also matches any upper-case letter.
upper
, upper-case
Match anything upper-case, as determined by the current case table. If case-fold-search
is non-nil, this also matches any lower-case letter.
graph
, graphic
Match any character except whitespace, ASCII and non-ASCII control characters, surrogates, and codepoints unassigned by Unicode, as indicated by the Unicode ‘general-category’ property.
print
, printing
Match whitespace or a character matched by graph
.
punct
, punctuation
Match any punctuation character. (At present, for multibyte characters, anything that has non-word syntax.)
word
, wordchar
Match any character that has word syntax (see Syntax Class Table).
ascii
Match any ASCII character (codes 0–127).
nonascii
Match any non-ASCII character (but not raw bytes).
Corresponding string regexp: ‘[[:class:]]’
(syntax syntax)
Match a character with syntax syntax, being one of the following names:
Syntax name | Syntax character |
---|---|
whitespace |
- |
punctuation |
. |
word |
w |
symbol |
_ |
open-parenthesis |
( |
close-parenthesis |
) |
expression-prefix |
' |
string-quote |
" |
paired-delimiter |
$ |
escape |
\ |
character-quote |
/ |
comment-start |
< |
comment-end |
> |
string-delimiter |
| |
comment-delimiter |
! |
For details, see Syntax Class Table. Please note that (syntax punctuation)
is not equivalent to the character class punctuation
. Corresponding string regexp: ‘\scode’
(category category)
Match a character in category category, which is either one of the names below or its category character.
Category name | Category character |
---|---|
space-for-indent |
space |
base |
. |
consonant |
0 |
base-vowel |
1 |
upper-diacritical-mark |
2 |
lower-diacritical-mark |
3 |
tone-mark |
4 |
symbol |
5 |
digit |
6 |
vowel-modifying-diacritical-mark |
7 |
vowel-sign |
8 |
semivowel-lower |
9 |
not-at-end-of-line |
< |
not-at-beginning-of-line |
> |
alpha-numeric-two-byte |
A |
chinese-two-byte |
C |
greek-two-byte |
G |
japanese-hiragana-two-byte |
H |
indian-two-byte |
I |
japanese-katakana-two-byte |
K |
strong-left-to-right |
L |
korean-hangul-two-byte |
N |
strong-right-to-left |
R |
cyrillic-two-byte |
Y |
combining-diacritic |
^ |
ascii |
a |
arabic |
b |
chinese |
c |
ethiopic |
e |
greek |
g |
korean |
h |
indian |
i |
japanese |
j |
japanese-katakana |
k |
latin |
l |
lao |
o |
tibetan |
q |
japanese-roman |
r |
thai |
t |
vietnamese |
v |
hebrew |
w |
cyrillic |
y |
can-break |
| |
For more information about currently defined categories, run the command M-x describe-categories RET. For how to define new categories, see Categories. Corresponding string regexp: ‘\ccode’
These all match the empty string, but only in specific places.
line-start
, bol
Match at the beginning of a line. Corresponding string regexp: ‘^’
line-end
, eol
Match at the end of a line. Corresponding string regexp: ‘$’
string-start
, bos
, buffer-start
, bot
Match at the start of the string or buffer being matched against. Corresponding string regexp: ‘\`’
string-end
, eos
, buffer-end
, eot
Match at the end of the string or buffer being matched against. Corresponding string regexp: ‘\'’
point
Match at point. Corresponding string regexp: ‘\=’
word-start
, bow
Match at the beginning of a word. Corresponding string regexp: ‘\<’
word-end
, eow
Match at the end of a word. Corresponding string regexp: ‘\>’
word-boundary
Match at the beginning or end of a word. Corresponding string regexp: ‘\b’
not-word-boundary
Match anywhere but at the beginning or end of a word. Corresponding string regexp: ‘\B’
symbol-start
Match at the beginning of a symbol. Corresponding string regexp: ‘\_<’
symbol-end
Match at the end of a symbol. Corresponding string regexp: ‘\_>’
(group rx…)
(submatch rx…)
Match the rxs, making the matched text and position accessible in the match data. The first group in a regexp is numbered 1; subsequent groups will be numbered one higher than the previous group. Corresponding string regexp: ‘\(…\)’
(group-n n rx…)
(submatch-n n rx…)
Like group
, but explicitly assign the group number n. n must be positive. Corresponding string regexp: ‘\(?n:…\)’
(backref n)
Match the text previously matched by group number n. n must be in the range 1–9. Corresponding string regexp: ‘\n’
(literal expr)
Match the literal string that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.
(regexp expr)
(regex expr)
Match the string regexp that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.
(eval expr)
Match the rx form that is the result from evaluating the Lisp expression expr. The evaluation takes place at macro-expansion time for rx
, at call time for rx-to-string
, in the current global environment.
Copyright © 1990-1996, 1998-2019 Free Software Foundation, Inc.
Licensed under the GNU GPL license.
https://www.gnu.org/software/emacs/manual/html_node/elisp/Rx-Constructs.html