Regular expressions are a important concept in formal language theory. They are a way to describe a possibly infinite set of character strings (called a language). A regular expression, at its core, needs the following features:
- A set of characters that can be used in the language, called the alphabet.
-
Concatenation:
ab
means "the character a
followed by the character b
". -
Union:
a|b
means "either a
or b
". -
Kleene star:
a*
means "zero or more a
characters".
Assuming a finite alphabet (such as the 26 letters of the English alphabet, or the entire Unicode character set), all regular languages can be generated by the features above. Of course, many patterns are very tedious to express this way (such as "10 digits" or "a character that's not a space"), so JavaScript regular expressions include many shorthands, introduced below.
Note: JavaScript regular expressions are in fact not regular, due to the existence of backreferences (regular expressions must have finite states). However, they are still a very useful feature.