Class StringScanner supports processing a stored string as a stream; this code creates a new StringScanner object with string 'foobarbaz':
require 'strscan'
scanner = StringScanner.new('foobarbaz')
All examples here assume that StringScanner has been required:
require 'strscan'
Some examples here assume that these constants are defined:
MULTILINE_TEXT = <<~EOT Go placidly amid the noise and haste, and remember what peace there may be in silence. EOT HIRAGANA_TEXT = 'こんにちは' ENGLISH_TEXT = 'Hello'
Some examples here assume that certain helper methods are defined:
put_situation(scanner): Displays the values of the scanner’s methods pos, charpos, rest, and rest_size.
put_match_values(scanner): Displays the scanner’s match values.
match_values_cleared?(scanner): Returns whether the scanner’s match values are cleared.
See examples at helper methods.
StringScanner ObjectThis code creates a StringScanner object (we’ll call it simply a scanner), and shows some of its basic properties:
scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
The scanner has:
A stored string, which is:
Initially set by StringScanner.new(string) to the given string ('foobarbaz' in the example above).
Modifiable by methods string=(new_string) and concat(more_string).
Returned by method string.
More at Stored String below.
A position; a zero-based index into the bytes of the stored string (not into its characters):
Initially set by StringScanner.new to 0.
Returned by method pos.
Modifiable explicitly by methods reset, terminate, and pos=(new_pos).
Modifiable implicitly (various traversing methods, among others).
More at Byte Position below.
A target substring, which is a trailing substring of the stored string; it extends from the current position to the end of the stored string:
Initially set by StringScanner.new(string) to the given string ('foobarbaz' in the example above).
Returned by method rest.
Modified by any modification to either the stored string or the position.
Most importantly: the searching and traversing methods operate on the target substring, which may be (and often is) less than the entire stored string.
More at Target Substring below.
The stored string is the string stored in the StringScanner object.
Each of these methods sets, modifies, or returns the stored string:
| Method | Effect |
|---|---|
::new(string) | Creates a new scanner for the given string. |
string=(new_string) | Replaces the existing stored string. |
concat(more_string) | Appends a string to the existing stored string. |
string | Returns the stored string. |
A StringScanner object maintains a zero-based byte position and a zero-based character position.
Each of these methods explicitly sets positions:
| Method | Effect |
|---|---|
reset | Sets both positions to zero (beginning of stored string). |
terminate | Sets both positions to the end of the stored string. |
pos=(new_byte_position) | Sets byte position; adjusts character position. |
The byte position (or simply position) is a zero-based index into the bytes in the scanner’s stored string; for a new StringScanner object, the byte position is zero.
When the byte position is:
Zero (at the beginning), the target substring is the entire stored string.
Equal to the size of the stored string (at the end), the target substring is the empty string ''.
To get or set the byte position:
pos: returns the byte position.
pos=(new_pos): sets the byte position.
Many methods use the byte position as the basis for finding matches; many others set, increment, or decrement the byte position:
scanner = StringScanner.new('foobar')
scanner.pos # => 0
scanner.scan(/foo/) # => "foo" # Match found.
scanner.pos # => 3 # Byte position incremented.
scanner.scan(/foo/) # => nil # Match not found.
scanner.pos # => 3 # Byte position not changed.
Some methods implicitly modify the byte position; see:
The values of these methods are derived directly from the values of pos and string:
charpos: the character position.
rest: the target substring.
rest_size: rest.size.
The character position is a zero-based index into the characters in the stored string; for a new StringScanner object, the character position is zero.
Method charpos returns the character position; its value may not be reset explicitly.
Some methods change (increment or reset) the character position; see:
Example (string includes multi-byte characters):
scanner = StringScanner.new(ENGLISH_TEXT) # Five 1-byte characters. scanner.concat(HIRAGANA_TEXT) # Five 3-byte characters scanner.string # => "Helloこんにちは" # Twenty bytes in all. put_situation(scanner) # Situation: # pos: 0 # charpos: 0 # rest: "Helloこんにちは" # rest_size: 20 scanner.scan(/Hello/) # => "Hello" # Five 1-byte characters. put_situation(scanner) # Situation: # pos: 5 # charpos: 5 # rest: "こんにちは" # rest_size: 15 scanner.getch # => "こ" # One 3-byte character. put_situation(scanner) # Situation: # pos: 8 # charpos: 6 # rest: "んにちは" # rest_size: 12
The target substring is the part of the stored string that extends from the current byte position to the end of the stored string; it is always either:
The entire stored string (byte position is zero).
A trailing substring of the stored string (byte position positive).
The target substring is returned by method rest, and its size is returned by method rest_size.
Examples:
scanner = StringScanner.new('foobarbaz')
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
scanner.pos = 3
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
scanner.pos = 9
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 9
# rest: ""
# rest_size: 0
The target substring is set whenever:
The stored string is set (position reset to zero; target substring set to stored string).
The byte position is set (target substring adjusted accordingly).
This table summarizes (details and examples at the links):
| Method | Returns |
|---|---|
rest | Target substring. |
rest_size | Size (bytes) of target substring. |
A search method examines the target substring, but does not advance the positions or (by implication) shorten the target substring.
This table summarizes (details and examples at the links):
| Method | Returns | Sets Match Values? |
|---|---|---|
check(pattern) | Matched leading substring or nil. | Yes. |
check_until(pattern) | Matched substring (anywhere) or nil. | Yes. |
exist?(pattern) | Matched substring (anywhere) end index. | Yes. |
match?(pattern) | Size of matched leading substring or nil. | Yes. |
peek(size) | Leading substring of given length (bytes). | No. |
peek_byte |
Integer leading byte or nil. | No. |
rest | Target substring (from byte position to end). | No. |
A traversal method examines the target substring, and, if successful:
Advances the positions.
Shortens the target substring.
This table summarizes (details and examples at links):
| Method | Returns | Sets Match Values? |
|---|---|---|
get_byte | Leading byte or nil. | No. |
getch | Leading character or nil. | No. |
scan(pattern) | Matched leading substring or nil. | Yes. |
scan_byte |
Integer leading byte or nil. | No. |
scan_until(pattern) | Matched substring (anywhere) or nil. | Yes. |
skip(pattern) | Matched leading substring size or nil. | Yes. |
skip_until(pattern) | Position delta to end-of-matched-substring or nil. | Yes. |
unscan |
self. | No. |
Each of these methods queries the scanner object without modifying it (details and examples at links)
| Method | Returns |
|---|---|
beginning_of_line? |
true or false. |
charpos | Character position. |
eos? |
true or false. |
fixed_anchor? |
true or false. |
inspect |
String representation of self. |
pos | Byte position. |
rest | Target substring. |
rest_size | Size of target substring. |
string | Stored string. |
StringScanner implements pattern matching via Ruby class Regexp, and its matching behaviors are the same as Ruby’s except for the fixed-anchor property.
Each matcher method takes a single argument pattern, and attempts to find a matching substring in the target substring.
| Method | Pattern Type | Matches Target Substring | Success Return | May Update Positions? |
|---|---|---|---|---|
check |
Regexp or String. | At beginning. | Matched substring. | No. |
check_until |
Regexp or String. | Anywhere. | Substring. | No. |
match? |
Regexp or String. | At beginning. | Match size. | No. |
exist? |
Regexp or String. | Anywhere. | Substring size. | No. |
scan |
Regexp or String. | At beginning. | Matched substring. | Yes. |
scan_until |
Regexp or String. | Anywhere. | Substring. | Yes. |
skip |
Regexp or String. | At beginning. | Match size. | Yes. |
skip_until |
Regexp or String. | Anywhere. | Substring size. | Yes. |
Which matcher you choose will depend on:
Where you want to find a match:
Only at the beginning of the target substring: check, match?, scan, skip.
Anywhere in the target substring: check_until, exist?, scan_until, skip_until.
Whether you want to:
Traverse, by advancing the positions: scan, scan_until, skip, skip_until.
Keep the positions unchanged: check, check_until, match?, exist?.
What you want for the return value:
The substring: check_until, scan_until.
The substring size: exist?, skip_until.
The match values in a StringScanner object generally contain the results of the most recent attempted match.
Each match value may be thought of as:
Clear: Initially, or after an unsuccessful match attempt: usually, false, nil, or {}.
Set: After a successful match attempt: true, string, array, or hash.
Each of these methods clears match values:
Each of these methods attempts a match based on a pattern, and either sets match values (if successful) or clears them (if not);
Basic match values are those not related to captures.
Each of these methods returns a basic match value:
| Method | Return After Match | Return After No Match |
|---|---|---|
matched? |
true. |
false. |
matched_size | Size of matched substring. |
nil. |
matched | Matched substring. |
nil. |
pre_match | Substring preceding matched substring. |
nil. |
post_match | Substring following matched substring. |
nil. |
See examples below.
Captured match values are those related to captures.
Each of these methods returns a captured match value:
| Method | Return After Match | Return After No Match |
|---|---|---|
size | Count of captured substrings. |
nil. |
[](n) |
nth captured substring. |
nil. |
captures |
Array of all captured substrings. |
nil. |
values_at(*n) |
Array of specified captured substrings. |
nil. |
named_captures |
Hash of named captures. |
{}. |
See examples below.
Successful basic match attempt (no captures):
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foo"
# matched : "bar"
# post_match: "baz"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bar", nil]
# []:
# [0]: "bar"
# [1]: nil
Failed basic match attempt (no captures);
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/nope/)
match_values_cleared?(scanner) # => true
Successful unnamed capture match attempt:
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 15
# pre_match: ""
# matched : "foobarbazbatbam"
# post_match: ""
# Captured match values:
# size: 4
# captures: ["foo", "baz", "bam"]
# named_captures: {}
# values_at: ["foobarbazbatbam", "foo", "baz", "bam", nil]
# []:
# [0]: "foobarbazbatbam"
# [1]: "foo"
# [2]: "baz"
# [3]: "bam"
# [4]: nil
Successful named capture match attempt; same as unnamed above, except for named_captures:
scanner = StringScanner.new('foobarbazbatbam')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
scanner.named_captures # => {"x"=>"foo", "y"=>"baz", "z"=>"bam"}
Failed unnamed capture match attempt:
scanner = StringScanner.new('somestring')
scanner.exist?(/(foo)bar(baz)bat(bam)/)
match_values_cleared?(scanner) # => true
Failed named capture match attempt; same as unnamed above, except for named_captures:
scanner = StringScanner.new('somestring')
scanner.exist?(/(?<x>foo)bar(?<y>baz)bat(?<z>bam)/)
match_values_cleared?(scanner) # => false
scanner.named_captures # => {"x"=>nil, "y"=>nil, "z"=>nil}
Pattern matching in StringScanner is the same as in Ruby’s, except for its fixed-anchor property, which determines the meaning of '\A':
false (the default): matches the current byte position.
scanner = StringScanner.new('foobar')
scanner.scan(/\A./) # => "f"
scanner.scan(/\A./) # => "o"
scanner.scan(/\A./) # => "o"
scanner.scan(/\A./) # => "b"
true: matches the beginning of the target substring; never matches unless the byte position is zero:
scanner = StringScanner.new('foobar', fixed_anchor: true)
scanner.scan(/\A./) # => "f"
scanner.scan(/\A./) # => nil
scanner.reset
scanner.scan(/\A./) # => "f"
The fixed-anchor property is set when the StringScanner object is created, and may not be modified (see StringScanner.new); method fixed_anchor? returns the setting.
static VALUE
strscan_initialize(int argc, VALUE *argv, VALUE self)
{
struct strscanner *p;
VALUE str, options;
p = check_strscan(self);
rb_scan_args(argc, argv, "11", &str, &options);
options = rb_check_hash_type(options);
if (!NIL_P(options)) {
VALUE fixed_anchor;
ID keyword_ids[1];
keyword_ids[0] = rb_intern("fixed_anchor");
rb_get_kwargs(options, keyword_ids, 0, 1, &fixed_anchor);
if (fixed_anchor == Qundef) {
p->fixed_anchor_p = false;
}
else {
p->fixed_anchor_p = RTEST(fixed_anchor);
}
}
else {
p->fixed_anchor_p = false;
}
StringValue(str);
RB_OBJ_WRITE(self, &p->str, str);
return self;
} Returns a new StringScanner object whose stored string is the given string; sets the fixed-anchor property:
scanner = StringScanner.new('foobarbaz')
scanner.string # => "foobarbaz"
scanner.fixed_anchor? # => false
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
static VALUE
strscan_aref(VALUE self, VALUE idx)
{
const char *name;
struct strscanner *p;
long i;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
switch (TYPE(idx)) {
case T_SYMBOL:
idx = rb_sym2str(idx);
/* fall through */
case T_STRING:
RSTRING_GETMEM(idx, name, i);
i = name_to_backref_number(&(p->regs), p->regex, name, name + i, rb_enc_get(idx));
break;
default:
i = NUM2LONG(idx);
}
if (i < 0)
i += p->regs.num_regs;
if (i < 0) return Qnil;
if (i >= p->regs.num_regs) return Qnil;
if (p->regs.beg[i] == -1) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.beg[i]),
adjust_register_position(p, p->regs.end[i]));
} Returns a captured substring or nil; see Captured Match Values.
When there are captures:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.scan(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
specifier zero: returns the entire matched substring:
scanner[0] # => "Fri Dec 12 " scanner.pre_match # => "" scanner.post_match # => "1975 14:39"
specifier positive integer. returns the nth capture, or nil if out of range:
scanner[1] # => "Fri" scanner[2] # => "Dec" scanner[3] # => "12" scanner[4] # => nil
specifier negative integer. counts backward from the last subgroup:
scanner[-1] # => "12" scanner[-4] # => "Fri Dec 12 " scanner[-5] # => nil
specifier symbol or string. returns the named subgroup, or nil if no such:
scanner[:wday] # => "Fri" scanner['wday'] # => "Fri" scanner[:month] # => "Dec" scanner[:day] # => "12" scanner[:nope] # => nil
When there are no captures, only [0] returns non-nil:
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/)
scanner[0] # => "bar"
scanner[1] # => nil
For a failed match, even [0] returns nil:
scanner.scan(/nope/) # => nil scanner[0] # => nil scanner[1] # => nil
static VALUE
strscan_bol_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (CURPTR(p) > S_PEND(p)) return Qnil;
if (p->curr == 0) return Qtrue;
return (*(CURPTR(p) - 1) == '\n') ? Qtrue : Qfalse;
} Returns whether the position is at the beginning of a line; that is, at the beginning of the stored string or immediately after a newline:
scanner = StringScanner.new(MULTILINE_TEXT)
scanner.string
# => "Go placidly amid the noise and haste,\nand remember what peace there may be in silence.\n"
scanner.pos # => 0
scanner.beginning_of_line? # => true
scanner.scan_until(/,/) # => "Go placidly amid the noise and haste,"
scanner.beginning_of_line? # => false
scanner.scan(/\n/) # => "\n"
scanner.beginning_of_line? # => true
scanner.terminate
scanner.beginning_of_line? # => true
scanner.concat('x')
scanner.terminate
scanner.beginning_of_line? # => false
StringScanner#bol? is an alias for StringScanner#beginning_of_line?.
static VALUE
strscan_captures(VALUE self)
{
struct strscanner *p;
int i, num_regs;
VALUE new_ary;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
num_regs = p->regs.num_regs;
new_ary = rb_ary_new2(num_regs);
for (i = 1; i < num_regs; i++) {
VALUE str;
if (p->regs.beg[i] == -1)
str = Qnil;
else
str = extract_range(p,
adjust_register_position(p, p->regs.beg[i]),
adjust_register_position(p, p->regs.end[i]));
rb_ary_push(new_ary, str);
}
return new_ary;
} Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.captures # => nil
scanner.exist?(/(?<wday>\w+) (?<month>\w+) (?<day>\d+) /)
scanner.captures # => ["Fri", "Dec", "12"]
scanner.values_at(*0..4) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
scanner.exist?(/Fri/)
scanner.captures # => []
scanner.scan(/nope/)
scanner.captures # => nil
static VALUE
strscan_get_charpos(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return LONG2NUM(rb_enc_strlen(S_PBEG(p), CURPTR(p), rb_enc_get(p->str)));
} call-seq: charpos -> character_position
Returns the character position (initially zero), which may be different from the byte position given by method pos:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.getch # => "こ" # 3-byte character. scanner.getch # => "ん" # 3-byte character. put_situation(scanner) # Situation: # pos: 6 # charpos: 2 # rest: "にちは" # rest_size: 9
static VALUE
strscan_check(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 1, 1);
} Attempts to match the given pattern at the beginning of the target substring; does not modify the positions.
If the match succeeds:
Returns the matched substring.
Sets all match values.
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.check('bar') # => "bar"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foo"
# matched : "bar"
# post_match: "baz"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bar", nil]
# []:
# [0]: "bar"
# [1]: nil
# => 0..1
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
If the match fails:
Returns nil.
Clears all match values.
scanner.check(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_check_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 1, 0);
} Attempts to match the given pattern anywhere (at any position) in the target substring; does not modify the positions.
If the match succeeds:
Sets all match values.
Returns the matched substring, which extends from the current position to the end of the matched substring.
scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.check_until(/bat/) # => "bazbat"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foobarbaz"
# matched : "bat"
# post_match: "bam"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bat", nil]
# []:
# [0]: "bat"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 6
# charpos: 6
# rest: "bazbatbam"
# rest_size: 9
If the match fails:
Clears all match values.
Returns nil.
scanner.check_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_concat(VALUE self, VALUE str)
{
struct strscanner *p;
GET_SCANNER(self, p);
StringValue(str);
rb_str_append(p->str, str);
return self;
} Appends the given more_string to the stored string.
Returns self.
Does not affect the positions or match values.
scanner = StringScanner.new('foo')
scanner.string # => "foo"
scanner.terminate
scanner.concat('barbaz') # => #<StringScanner 3/9 "foo" @ "barba...">
scanner.string # => "foobarbaz"
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
static VALUE
strscan_eos_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return EOS_P(p) ? Qtrue : Qfalse;
} Returns whether the position is at the end of the stored string:
scanner = StringScanner.new('foobarbaz')
scanner.eos? # => false
pos = 3
scanner.eos? # => false
scanner.terminate
scanner.eos? # => true
static VALUE
strscan_exist_p(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 0, 0);
} Attempts to match the given pattern anywhere (at any position) n the target substring; does not modify the positions.
If the match succeeds:
Returns a byte offset: the distance in bytes between the current position and the end of the matched substring.
Sets all match values.
scanner = StringScanner.new('foobarbazbatbam')
scanner.pos = 6
scanner.exist?(/bat/) # => 6
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foobarbaz"
# matched : "bat"
# post_match: "bam"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bat", nil]
# []:
# [0]: "bat"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 6
# charpos: 6
# rest: "bazbatbam"
# rest_size: 9
If the match fails:
Returns nil.
Clears all match values.
scanner.exist?(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_fixed_anchor_p(VALUE self)
{
struct strscanner *p;
p = check_strscan(self);
return p->fixed_anchor_p ? Qtrue : Qfalse;
} Returns whether the fixed-anchor property is set.
static VALUE
strscan_get_byte(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
p->prev = p->curr;
p->curr++;
MATCHED(p);
adjust_registers_to_matched(p);
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
} call-seq: get_byte -> byte_as_character or nil
Returns the next byte, if available:
If the position is not at the end of the stored string:
Returns the next byte.
Increments the byte position.
Adjusts the character position.
scanner = StringScanner.new(HIRAGANA_TEXT) # => #<StringScanner 0/15 @ "\xE3\x81\x93\xE3\x82..."> scanner.string # => "こんにちは" [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 1, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\xE3", 4, 2] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x82", 5, 3] [scanner.get_byte, scanner.pos, scanner.charpos] # => ["\x93", 6, 2]
Otherwise, returns nil, and does not change the positions.
scanner.terminate [scanner.get_byte, scanner.pos, scanner.charpos] # => [nil, 15, 5]
static VALUE
strscan_getch(VALUE self)
{
struct strscanner *p;
long len;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
len = rb_enc_mbclen(CURPTR(p), S_PEND(p), rb_enc_get(p->str));
len = minl(len, S_RESTLEN(p));
p->prev = p->curr;
p->curr += len;
MATCHED(p);
adjust_registers_to_matched(p);
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
} call-seq: getch -> character or nil
Returns the next (possibly multibyte) character, if available:
If the position is at the beginning of a character:
Returns the character.
Increments the character position by 1.
Increments the byte position by the size (in bytes) of the character.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" [scanner.getch, scanner.pos, scanner.charpos] # => ["こ", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["に", 9, 3] [scanner.getch, scanner.pos, scanner.charpos] # => ["ち", 12, 4] [scanner.getch, scanner.pos, scanner.charpos] # => ["は", 15, 5] [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
If the position is within a multi-byte character (that is, not at its beginning), behaves like get_byte (returns a 1-byte character):
scanner.pos = 1 [scanner.getch, scanner.pos, scanner.charpos] # => ["\x81", 2, 2] [scanner.getch, scanner.pos, scanner.charpos] # => ["\x93", 3, 1] [scanner.getch, scanner.pos, scanner.charpos] # => ["ん", 6, 2]
If the position is at the end of the stored string, returns nil and does not modify the positions:
scanner.terminate [scanner.getch, scanner.pos, scanner.charpos] # => [nil, 15, 5]
static VALUE
strscan_inspect(VALUE self)
{
struct strscanner *p;
VALUE a, b;
p = check_strscan(self);
if (NIL_P(p->str)) {
a = rb_sprintf("#<%"PRIsVALUE" (uninitialized)>", rb_obj_class(self));
return a;
}
if (EOS_P(p)) {
a = rb_sprintf("#<%"PRIsVALUE" fin>", rb_obj_class(self));
return a;
}
if (p->curr == 0) {
b = inspect2(p);
a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld @ %"PRIsVALUE">",
rb_obj_class(self),
p->curr, S_LEN(p),
b);
return a;
}
a = inspect1(p);
b = inspect2(p);
a = rb_sprintf("#<%"PRIsVALUE" %ld/%ld %"PRIsVALUE" @ %"PRIsVALUE">",
rb_obj_class(self),
p->curr, S_LEN(p),
a, b);
return a;
} Returns a string representation of self that may show:
The current position.
The size (in bytes) of the stored string.
The substring preceding the current position.
The substring following the current position (which is also the target substring).
scanner = StringScanner.new("Fri Dec 12 1975 14:39")
scanner.pos = 11
scanner.inspect # => "#<StringScanner 11/21 \"...c 12 \" @ \"1975 ...\">"
If at beginning-of-string, item 4 above (following substring) is omitted:
scanner.reset scanner.inspect # => "#<StringScanner 0/21 @ \"Fri D...\">"
If at end-of-string, all items above are omitted:
scanner.terminate scanner.inspect # => "#<StringScanner fin>"
static VALUE
strscan_match_p(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 0, 0, 1);
} Attempts to match the given pattern at the beginning of the target substring; does not modify the positions.
If the match succeeds:
Sets match values.
Returns the size in bytes of the matched substring.
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.match?(/bar/) => 3
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "foo"
# matched : "bar"
# post_match: "baz"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["bar", nil]
# []:
# [0]: "bar"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
If the match fails:
Clears match values.
Returns nil.
Does not increment positions.
scanner.match?(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_matched(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.beg[0]),
adjust_register_position(p, p->regs.end[0]));
} Returns the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched # => nil
scanner.pos = 3
scanner.match?(/bar/) # => 3
scanner.matched # => "bar"
scanner.match?(/nope/) # => nil
scanner.matched # => nil
static VALUE
strscan_matched_p(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return MATCHED_P(p) ? Qtrue : Qfalse;
} Returns true of the most recent match attempt was successful, false otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched? # => false
scanner.pos = 3
scanner.exist?(/baz/) # => 6
scanner.matched? # => true
scanner.exist?(/nope/) # => nil
scanner.matched? # => false
static VALUE
strscan_matched_size(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return LONG2NUM(p->regs.end[0] - p->regs.beg[0]);
} Returns the size (in bytes) of the matched substring from the most recent match match attempt if it was successful, or nil otherwise; see Basic Matched Values:
scanner = StringScanner.new('foobarbaz')
scanner.matched_size # => nil
pos = 3
scanner.exist?(/baz/) # => 9
scanner.matched_size # => 3
scanner.exist?(/nope/) # => nil
scanner.matched_size # => nil
static VALUE
strscan_named_captures(VALUE self)
{
struct strscanner *p;
named_captures_data data;
GET_SCANNER(self, p);
data.self = self;
data.captures = rb_hash_new();
if (!RB_NIL_P(p->regex)) {
onig_foreach_name(RREGEXP_PTR(p->regex), named_captures_iter, &data);
}
return data.captures;
} Returns the array of captured match values at indexes (1..) if the most recent match attempt succeeded, or nil otherwise; see Captured Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.named_captures # => {}
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>"Fri", "month"=>"Dec", "day"=>"12"}
scanner.string = 'nope'
scanner.match?(pattern)
scanner.named_captures # => {"wday"=>nil, "month"=>nil, "day"=>nil}
scanner.match?(/nosuch/)
scanner.named_captures # => {}
static VALUE
strscan_peek(VALUE self, VALUE vlen)
{
struct strscanner *p;
long len;
GET_SCANNER(self, p);
len = NUM2LONG(vlen);
if (EOS_P(p))
return str_new(p, "", 0);
len = minl(len, S_RESTLEN(p));
return extract_beg_len(p, p->curr, len);
} Returns the substring string[pos, length]; does not update match values or positions:
scanner = StringScanner.new('foobarbaz')
scanner.pos = 3
scanner.peek(3) # => "bar"
scanner.terminate
scanner.peek(3) # => ""
static VALUE
strscan_peek_byte(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (EOS_P(p))
return Qnil;
return INT2FIX((unsigned char)*CURPTR(p));
} Peeks at the current byte and returns it as an integer.
s = StringScanner.new('ab')
s.peek_byte # => 97
call-seq: pos -> byte_position
Returns the integer byte position, which may be different from the character position:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos # => 0 scanner.getch # => "こ" # 3-byte character. scanner.charpos # => 1 scanner.pos # => 3
call-seq: pos = n -> n pointer = n -> n
Sets the byte position and the character position; returns n.
Does not affect match values.
For non-negative n, sets the position to n:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 3 # => 3 scanner.rest # => "んにちは" scanner.charpos # => 1
For negative n, counts from the end of the stored string:
scanner.pos = -9 # => -9 scanner.pos # => 6 scanner.rest # => "にちは" scanner.charpos # => 2
static VALUE
strscan_get_pos(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return LONG2NUM(p->curr);
} call-seq: pos -> byte_position
Returns the integer byte position, which may be different from the character position:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos # => 0 scanner.getch # => "こ" # 3-byte character. scanner.charpos # => 1 scanner.pos # => 3
static VALUE
strscan_set_pos(VALUE self, VALUE v)
{
struct strscanner *p;
long i;
GET_SCANNER(self, p);
i = NUM2LONG(v);
if (i < 0) i += S_LEN(p);
if (i < 0) rb_raise(rb_eRangeError, "index out of range");
if (i > S_LEN(p)) rb_raise(rb_eRangeError, "index out of range");
p->curr = i;
return LONG2NUM(i);
} call-seq: pos = n -> n pointer = n -> n
Sets the byte position and the character position; returns n.
Does not affect match values.
For non-negative n, sets the position to n:
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.pos = 3 # => 3 scanner.rest # => "んにちは" scanner.charpos # => 1
For negative n, counts from the end of the stored string:
scanner.pos = -9 # => -9 scanner.pos # => 6 scanner.rest # => "にちは" scanner.charpos # => 2
static VALUE
strscan_post_match(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
adjust_register_position(p, p->regs.end[0]),
S_LEN(p));
} Returns the substring that follows the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Match Values:
scanner = StringScanner.new('foobarbaz')
scanner.post_match # => nil
scanner.pos = 3
scanner.match?(/bar/) # => 3
scanner.post_match # => "baz"
scanner.match?(/nope/) # => nil
scanner.post_match # => nil
static VALUE
strscan_pre_match(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return extract_range(p,
0,
adjust_register_position(p, p->regs.beg[0]));
} Returns the substring that precedes the matched substring from the most recent match attempt if it was successful, or nil otherwise; see Basic Match Values:
scanner = StringScanner.new('foobarbaz')
scanner.pre_match # => nil
scanner.pos = 3
scanner.exist?(/baz/) # => 6
scanner.pre_match # => "foobar" # Substring of entire string, not just target string.
scanner.exist?(/nope/) # => nil
scanner.pre_match # => nil
static VALUE
strscan_reset(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
p->curr = 0;
CLEAR_MATCH_STATUS(p);
return self;
} Sets both byte position and character position to zero, and clears match values; returns self:
scanner = StringScanner.new('foobarbaz')
scanner.exist?(/bar/) # => 6
scanner.reset # => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
# => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_rest(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (EOS_P(p)) {
return str_new(p, "", 0);
}
return extract_range(p, p->curr, S_LEN(p));
} Returns the ‘rest’ of the stored string (all after the current position), which is the target substring:
scanner = StringScanner.new('foobarbaz')
scanner.rest # => "foobarbaz"
scanner.pos = 3
scanner.rest # => "barbaz"
scanner.terminate
scanner.rest # => ""
static VALUE
strscan_rest_size(VALUE self)
{
struct strscanner *p;
long i;
GET_SCANNER(self, p);
if (EOS_P(p)) {
return INT2FIX(0);
}
i = S_RESTLEN(p);
return INT2FIX(i);
} Returns the size (in bytes) of the rest of the stored string:
scanner = StringScanner.new('foobarbaz')
scanner.rest # => "foobarbaz"
scanner.rest_size # => 9
scanner.pos = 3
scanner.rest # => "barbaz"
scanner.rest_size # => 6
scanner.terminate
scanner.rest # => ""
scanner.rest_size # => 0
static VALUE
strscan_scan(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 1, 1);
} call-seq: scan(pattern) -> substring or nil
Attempts to match the given pattern at the beginning of the target substring.
If the match succeeds:
Returns the matched substring.
Increments the byte position by substring.bytesize, and may increment the character position.
Sets match values.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.scan(/に/) # => "に"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こん"
# matched : "に"
# post_match: "ちは"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["に", nil]
# []:
# [0]: "に"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 3
# rest: "ちは"
# rest_size: 6
If the match fails:
Returns nil.
Does not increment byte and character positions.
Clears match values.
scanner.scan(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_scan_byte(VALUE self)
{
struct strscanner *p;
VALUE byte;
GET_SCANNER(self, p);
CLEAR_MATCH_STATUS(p);
if (EOS_P(p))
return Qnil;
byte = INT2FIX((unsigned char)*CURPTR(p));
p->prev = p->curr;
p->curr++;
MATCHED(p);
adjust_registers_to_matched(p);
return byte;
} Scans one byte and returns it as an integer. This method is not multibyte character sensitive. See also: getch.
# File ext/strscan/lib/strscan/strscan.rb, line 15
def scan_integer(base: 10)
case base
when 10
scan_base10_integer
when 16
scan_base16_integer
else
raise ArgumentError, "Unsupported integer base: #{base.inspect}, expected 10 or 16"
end
end If ‘base` isn’t provided or is ‘10`, then it is equivalent to calling `#scan` with a `[+-]?d+` pattern, and returns an Integer or nil.
If ‘base` is `16`, then it is equivalent to calling `#scan` with a `[+-]?(0x)?+` pattern, and returns an Integer or nil.
The scanned string must be encoded with an ASCII compatible encoding, otherwise Encoding::CompatibilityError will be raised.
static VALUE
strscan_scan_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 1, 0);
} call-seq: scan_until(pattern) -> substring or nil
Attempts to match the given pattern anywhere (at any position) in the target substring.
If the match attempt succeeds:
Sets match values.
Sets the byte position to the end of the matched substring; may adjust the character position.
Returns the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.scan_until(/ち/) # => "にち"
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こんに"
# matched : "ち"
# post_match: "は"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["ち", nil]
# []:
# [0]: "ち"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 12
# charpos: 4
# rest: "は"
# rest_size: 3
If the match attempt fails:
Clears match data.
Returns nil.
Does not update positions.
scanner.scan_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_size(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
return INT2FIX(p->regs.num_regs);
} Returns the count of captures if the most recent match attempt succeeded, nil otherwise; see Captures Match Values:
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
scanner.size # => nil
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..scanner.size) # => ["Fri Dec 12 ", "Fri", "Dec", "12", nil]
scanner.size # => 4
scanner.match?(/nope/) # => nil
scanner.size # => nil
static VALUE
strscan_skip(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 0, 1);
} call-seq: skip(pattern) match_size or nil
Attempts to match the given pattern at the beginning of the target substring;
If the match succeeds:
Increments the byte position by substring.bytesize, and may increment the character position.
Sets match values.
Returns the size (bytes) of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.skip(/に/) # => 3
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こん"
# matched : "に"
# post_match: "ちは"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["に", nil]
# []:
# [0]: "に"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 9
# charpos: 3
# rest: "ちは"
# rest_size: 6
scanner.skip(/nope/) # => nil
match_values_cleared?(scanner) # => true
static VALUE
strscan_skip_until(VALUE self, VALUE re)
{
return strscan_do_scan(self, re, 1, 0, 0);
} call-seq: skip_until(pattern) -> matched_substring_size or nil
Attempts to match the given pattern anywhere (at any position) in the target substring; does not modify the positions.
If the match attempt succeeds:
Sets match values.
Returns the size of the matched substring.
scanner = StringScanner.new(HIRAGANA_TEXT)
scanner.string # => "こんにちは"
scanner.pos = 6
scanner.skip_until(/ち/) # => 6
put_match_values(scanner)
# Basic match values:
# matched?: true
# matched_size: 3
# pre_match: "こんに"
# matched : "ち"
# post_match: "は"
# Captured match values:
# size: 1
# captures: []
# named_captures: {}
# values_at: ["ち", nil]
# []:
# [0]: "ち"
# [1]: nil
put_situation(scanner)
# Situation:
# pos: 12
# charpos: 4
# rest: "は"
# rest_size: 3
If the match attempt fails:
Clears match values.
Returns nil.
scanner.skip_until(/nope/) # => nil match_values_cleared?(scanner) # => true
static VALUE
strscan_get_string(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
return p->str;
} Returns the stored string:
scanner = StringScanner.new('foobar')
scanner.string # => "foobar"
scanner.concat('baz')
scanner.string # => "foobarbaz"
static VALUE
strscan_set_string(VALUE self, VALUE str)
{
struct strscanner *p = check_strscan(self);
StringValue(str);
RB_OBJ_WRITE(self, &p->str, str);
p->curr = 0;
CLEAR_MATCH_STATUS(p);
return str;
} Replaces the stored string with the given other_string:
Sets both positions to zero.
Clears match values.
Returns other_string.
scanner = StringScanner.new('foobar')
scanner.scan(/foo/)
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "bar"
# rest_size: 3
match_values_cleared?(scanner) # => false
scanner.string = 'baz' # => "baz"
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "baz"
# rest_size: 3
match_values_cleared?(scanner) # => true
static VALUE
strscan_terminate(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
p->curr = S_LEN(p);
CLEAR_MATCH_STATUS(p);
return self;
} call-seq: terminate -> self
Sets the scanner to end-of-string; returns self:
Sets both positions to end-of-stream.
Clears match values.
scanner = StringScanner.new(HIRAGANA_TEXT) scanner.string # => "こんにちは" scanner.scan_until(/に/) put_situation(scanner) # Situation: # pos: 9 # charpos: 3 # rest: "ちは" # rest_size: 6 match_values_cleared?(scanner) # => false scanner.terminate # => #<StringScanner fin> put_situation(scanner) # Situation: # pos: 15 # charpos: 5 # rest: "" # rest_size: 0 match_values_cleared?(scanner) # => true
static VALUE
strscan_unscan(VALUE self)
{
struct strscanner *p;
GET_SCANNER(self, p);
if (! MATCHED_P(p))
rb_raise(ScanError, "unscan failed: previous match record not exist");
p->curr = p->prev;
CLEAR_MATCH_STATUS(p);
return self;
} Sets the position to its value previous to the recent successful match attempt:
scanner = StringScanner.new('foobarbaz')
scanner.scan(/foo/)
put_situation(scanner)
# Situation:
# pos: 3
# charpos: 3
# rest: "barbaz"
# rest_size: 6
scanner.unscan
# => #<StringScanner 0/9 @ "fooba...">
put_situation(scanner)
# Situation:
# pos: 0
# charpos: 0
# rest: "foobarbaz"
# rest_size: 9
Raises an exception if match values are clear:
scanner.scan(/nope/) # => nil match_values_cleared?(scanner) # => true scanner.unscan # Raises StringScanner::Error.
static VALUE
strscan_values_at(int argc, VALUE *argv, VALUE self)
{
struct strscanner *p;
long i;
VALUE new_ary;
GET_SCANNER(self, p);
if (! MATCHED_P(p)) return Qnil;
new_ary = rb_ary_new2(argc);
for (i = 0; i<argc; i++) {
rb_ary_push(new_ary, strscan_aref(self, argv[i]));
}
return new_ary;
} Returns an array of captured substrings, or nil of none.
For each specifier, the returned substring is [specifier]; see [].
scanner = StringScanner.new('Fri Dec 12 1975 14:39')
pattern = /(?<wday>\w+) (?<month>\w+) (?<day>\d+) /
scanner.match?(pattern)
scanner.values_at(*0..3) # => ["Fri Dec 12 ", "Fri", "Dec", "12"]
scanner.values_at(*%i[wday month day]) # => ["Fri", "Dec", "12"]
static VALUE
strscan_init_copy(VALUE vself, VALUE vorig)
{
struct strscanner *self, *orig;
self = check_strscan(vself);
orig = check_strscan(vorig);
if (self != orig) {
self->flags = orig->flags;
RB_OBJ_WRITE(vself, &self->str, orig->str);
self->prev = orig->prev;
self->curr = orig->curr;
if (rb_reg_region_copy(&self->regs, &orig->regs))
rb_memerror();
RB_GC_GUARD(vorig);
}
return vself;
} Returns a shallow copy of self; the stored string in the copy is the same string as in self.
Ruby Core © 1993–2025 Yukihiro Matsumoto
Licensed under the Ruby License.
Ruby Standard Library © contributors
Licensed under their own licenses.