W3cubDocs

/Elisp

Constructs in rx regexps

The various forms in rx regexps are described below. The shorthand rx represents any rx form, and rx… means zero or more rx forms. Where the corresponding string regexp syntax is given, A, B, … are string regexp subexpressions.

Literals

"some-string"

Match the string ‘some-string’ literally. There are no characters with special meaning, unlike in string regexps.

?C

Match the character ‘C’ literally.

Sequence and alternative

(seq rx…)
(sequence rx…)
(: rx…)
(and rx…)

Match the rxs in sequence. Without arguments, the expression matches the empty string. Corresponding string regexp: ‘AB’ (subexpressions in sequence).

(or rx…)
(| rx…)

Match exactly one of the rxs. If all arguments are strings, characters, or or forms so constrained, the longest possible match will always be used. Otherwise, either the longest match or the first (in left-to-right order) will be used. Without arguments, the expression will not match anything at all. Corresponding string regexp: ‘A\|B\|…’.

unmatchable

Refuse any match. Equivalent to (or). See regexp-unmatchable.

Repetition

Normally, repetition forms are greedy, in that they attempt to match as many times as possible. Some forms are non-greedy; they try to match as few times as possible (see Non-greedy repetition).

(zero-or-more rx…)
(0+ rx…)

Match the rxs zero or more times. Greedy by default. Corresponding string regexp: ‘A*’ (greedy), ‘A*?’ (non-greedy)

(one-or-more rx…)
(1+ rx…)

Match the rxs one or more times. Greedy by default. Corresponding string regexp: ‘A+’ (greedy), ‘A+?’ (non-greedy)

(zero-or-one rx…)
(optional rx…)
(opt rx…)

Match the rxs once or an empty string. Greedy by default. Corresponding string regexp: ‘A?’ (greedy), ‘A??’ (non-greedy).

(* rx…)

Match the rxs zero or more times. Greedy. Corresponding string regexp: ‘A*

(+ rx…)

Match the rxs one or more times. Greedy. Corresponding string regexp: ‘A+

(? rx…)

Match the rxs once or an empty string. Greedy. Corresponding string regexp: ‘A?

(*? rx…)

Match the rxs zero or more times. Non-greedy. Corresponding string regexp: ‘A*?

(+? rx…)

Match the rxs one or more times. Non-greedy. Corresponding string regexp: ‘A+?

(?? rx…)

Match the rxs or an empty string. Non-greedy. Corresponding string regexp: ‘A??

(= n rx…)
(repeat n rx)

Match the rxs exactly n times. Corresponding string regexp: ‘A\{n\}

(>= n rx…)

Match the rxs n or more times. Greedy. Corresponding string regexp: ‘A\{n,\}

(** n m rx…)
(repeat n m rx…)

Match the rxs at least n but no more than m times. Greedy. Corresponding string regexp: ‘A\{n,m\}

The greediness of some repetition forms can be controlled using the following constructs. However, it is usually better to use the explicit non-greedy forms above when such matching is required.

(minimal-match rx)

Match rx, with zero-or-more, 0+, one-or-more, 1+, zero-or-one, opt and optional using non-greedy matching.

(maximal-match rx)

Match rx, with zero-or-more, 0+, one-or-more, 1+, zero-or-one, opt and optional using greedy matching. This is the default.

Matching single characters

(any set…)
(char set…)
(in set…)

Match a single character from one of the sets. Each set is a character, a string representing the set of its characters, a range or a character class (see below). A range is either a hyphen-separated string like "A-Z", or a cons of characters like (?A . ?Z).

Note that hyphen (-) is special in strings in this construct, since it acts as a range separator. To include a hyphen, add it as a separate character or single-character string. Corresponding string regexp: ‘[…]

(not charspec)

Match a character not included in charspec. charspec can be a character, a single-character string, an any, not, or, intersection, syntax or category form, or a character class. If charspec is an or form, its arguments have the same restrictions as those of intersection; see below. Corresponding string regexp: ‘[^…]’, ‘\Scode’, ‘\Ccode

(intersection charset…)

Match a character included in all of the charsets. Each charset can be a character, a single-character string, an any form without character classes, or an intersection, or or not form whose arguments are also charsets.

not-newline, nonl

Match any character except a newline. Corresponding string regexp: ‘.’ (dot)

anychar, anything

Match any character. Corresponding string regexp: ‘.\|\n’ (for example)

character class

Match a character from a named character class:

alpha, alphabetic, letter

Match alphabetic characters. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic.

alnum, alphanumeric

Match alphabetic characters and digits. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are alphabetic or decimal digits.

digit, numeric, num

Match the digits ‘0’–‘9’.

xdigit, hex-digit, hex

Match the hexadecimal digits ‘0’–‘9’, ‘A’–‘F’ and ‘a’–‘f’.

cntrl, control

Match any character whose code is in the range 0–31.

blank

Match horizontal whitespace. More precisely, match characters whose Unicode ‘general-category’ property indicates that they are spacing separators.

space, whitespace, white

Match any character that has whitespace syntax (see Syntax Class Table).

lower, lower-case

Match anything lower-case, as determined by the current case table. If case-fold-search is non-nil, this also matches any upper-case letter.

upper, upper-case

Match anything upper-case, as determined by the current case table. If case-fold-search is non-nil, this also matches any lower-case letter.

graph, graphic

Match any character except whitespace, ASCII and non-ASCII control characters, surrogates, and codepoints unassigned by Unicode, as indicated by the Unicode ‘general-category’ property.

print, printing

Match whitespace or a character matched by graph.

punct, punctuation

Match any punctuation character. (At present, for multibyte characters, anything that has non-word syntax.)

word, wordchar

Match any character that has word syntax (see Syntax Class Table).

ascii

Match any ASCII character (codes 0–127).

nonascii

Match any non-ASCII character (but not raw bytes).

Corresponding string regexp: ‘[[:class:]]

(syntax syntax)

Match a character with syntax syntax, being one of the following names:

Syntax name Syntax character
whitespace -
punctuation .
word w
symbol _
open-parenthesis (
close-parenthesis )
expression-prefix '
string-quote "
paired-delimiter $
escape \
character-quote /
comment-start <
comment-end >
string-delimiter |
comment-delimiter !

For details, see Syntax Class Table. Please note that (syntax punctuation) is not equivalent to the character class punctuation. Corresponding string regexp: ‘\scode

(category category)

Match a character in category category, which is either one of the names below or its category character.

Category name Category character
space-for-indent space
base .
consonant 0
base-vowel 1
upper-diacritical-mark 2
lower-diacritical-mark 3
tone-mark 4
symbol 5
digit 6
vowel-modifying-diacritical-mark 7
vowel-sign 8
semivowel-lower 9
not-at-end-of-line <
not-at-beginning-of-line >
alpha-numeric-two-byte A
chinese-two-byte C
greek-two-byte G
japanese-hiragana-two-byte H
indian-two-byte I
japanese-katakana-two-byte K
strong-left-to-right L
korean-hangul-two-byte N
strong-right-to-left R
cyrillic-two-byte Y
combining-diacritic ^
ascii a
arabic b
chinese c
ethiopic e
greek g
korean h
indian i
japanese j
japanese-katakana k
latin l
lao o
tibetan q
japanese-roman r
thai t
vietnamese v
hebrew w
cyrillic y
can-break |

For more information about currently defined categories, run the command M-x describe-categories RET. For how to define new categories, see Categories. Corresponding string regexp: ‘\ccode

Zero-width assertions

These all match the empty string, but only in specific places.

line-start, bol

Match at the beginning of a line. Corresponding string regexp: ‘^

line-end, eol

Match at the end of a line. Corresponding string regexp: ‘$

string-start, bos, buffer-start, bot

Match at the start of the string or buffer being matched against. Corresponding string regexp: ‘\`

string-end, eos, buffer-end, eot

Match at the end of the string or buffer being matched against. Corresponding string regexp: ‘\'

point

Match at point. Corresponding string regexp: ‘\=

word-start, bow

Match at the beginning of a word. Corresponding string regexp: ‘\<

word-end, eow

Match at the end of a word. Corresponding string regexp: ‘\>

word-boundary

Match at the beginning or end of a word. Corresponding string regexp: ‘\b

not-word-boundary

Match anywhere but at the beginning or end of a word. Corresponding string regexp: ‘\B

symbol-start

Match at the beginning of a symbol. Corresponding string regexp: ‘\_<

symbol-end

Match at the end of a symbol. Corresponding string regexp: ‘\_>

Capture groups

(group rx…)
(submatch rx…)

Match the rxs, making the matched text and position accessible in the match data. The first group in a regexp is numbered 1; subsequent groups will be numbered one higher than the previous group. Corresponding string regexp: ‘\(…\)

(group-n n rx…)
(submatch-n n rx…)

Like group, but explicitly assign the group number n. n must be positive. Corresponding string regexp: ‘\(?n:…\)

(backref n)

Match the text previously matched by group number n. n must be in the range 1–9. Corresponding string regexp: ‘\n

Dynamic inclusion

(literal expr)

Match the literal string that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.

(regexp expr)
(regex expr)

Match the string regexp that is the result from evaluating the Lisp expression expr. The evaluation takes place at call time, in the current lexical environment.

(eval expr)

Match the rx form that is the result from evaluating the Lisp expression expr. The evaluation takes place at macro-expansion time for rx, at call time for rx-to-string, in the current global environment.

Copyright © 1990-1996, 1998-2019 Free Software Foundation, Inc.
Licensed under the GNU GPL license.
https://www.gnu.org/software/emacs/manual/html_node/elisp/Rx-Constructs.html