A regular expression generally matches from left to right. This is why lookahead and lookbehind assertions are called as such — lookahead asserts what's on the right, and lookbehind asserts what's on the left.
In order for a (?<=pattern)
assertion to succeed, the pattern
must match the input immediately to the left of the current position, but the current position is not changed before matching the subsequent input. The (?<!pattern)
form negates the assertion — it succeeds if the pattern
does not match the input immediately to the left of the current position.
Lookbehind generally has the same semantics as lookahead — however, within a lookbehind assertion, the regular expression matches backwards. For example,
/(?<=([ab]+)([bc]+))$/.exec("abc");
If the lookbehind matches from left to right, it should first greedily match [ab]+
, which makes the first group capture "ab"
, and the remaining "c"
is captured by [bc]+
. However, because [bc]+
is matched first, it greedily grabs "bc"
, leaving only "a"
for [ab]+
.
This behavior is reasonable — the matcher does not know where to start the match (because the lookbehind may not be fixed-length), but it does know where to end (at the current position). Therefore, it starts from the current position and works backwards. (Regexes in some other languages forbid non-fixed-length lookbehind to avoid this issue.)
For quantified capturing groups inside the lookbehind, the match furthest to the left of the input string — instead of the one on the right — is captured because of backward matching. See the capturing groups page for more information. Backreferences inside the lookbehind must appear on the left of the group it's referring to, also due to backward matching. However, disjunctions are still attempted left-to-right.