W3cubDocs

/Ruby 4.0

class String

Parent:
Object
Included modules:
Comparable

A String object has an arbitrary sequence of bytes, typically representing text or binary data. A String object may be created using String::new or as literals.

String objects differ from Symbol objects in that Symbol objects are designed to be used as identifiers, instead of text or data.

You can create a String object explicitly with:

You can convert certain objects to Strings with:

Some String methods modify self. Typically, a method whose name ends with ! modifies self and returns self; often, a similarly named method (without the !) returns a new string.

In general, if both bang and non-bang versions of a method exist, the bang method mutates and the non-bang method does not. However, a method without a bang can also mutate, such as String#replace.

Substitution Methods

These methods perform substitutions:

  • String#sub: One substitution (or none); returns a new string.

  • String#sub!: One substitution (or none); returns self if any changes, nil otherwise.

  • String#gsub: Zero or more substitutions; returns a new string.

  • String#gsub!: Zero or more substitutions; returns self if any changes, nil otherwise.

Each of these methods takes:

  • A first argument, pattern (String or Regexp), that specifies the substring(s) to be replaced.

  • Either of the following:

    • A second argument, replacement (String or Hash), that determines the replacing string.

    • A block that will determine the replacing string.

The examples in this section mostly use the String#sub and String#gsub methods; the principles illustrated apply to all four substitution methods.

Argument pattern

Argument pattern is commonly a regular expression:

s = 'hello'
s.sub(/[aeiou]/, '*') # => "h*llo"
s.gsub(/[aeiou]/, '*') # => "h*ll*"
s.gsub(/[aeiou]/, '')  # => "hll"
s.sub(/ell/, 'al')     # => "halo"
s.gsub(/xyzzy/, '*')   # => "hello"
'THX1138'.gsub(/\d+/, '00') # => "THX00"

When pattern is a string, all its characters are treated as ordinary characters (not as Regexp special characters):

'THX1138'.gsub('\d+', '00') # => "THX1138"

String replacement

If replacement is a string, that string determines the replacing string that is substituted for the matched text.

Each of the examples above uses a simple string as the replacing string.

String replacement may contain back-references to the pattern’s captures:

  • \n (n is a non-negative integer) refers to $n.

  • \k<name> refers to the named capture name.

See Regexp for details.

Note that within the string replacement, a character combination such as $& is treated as ordinary text, not as a special match variable. However, you may refer to some special match variables using these combinations:

  • \& and \0 correspond to $&, which contains the complete matched text.

  • \' corresponds to $', which contains the string after the match.

  • \` corresponds to $`, which contains the string before the match.

  • \+ corresponds to $+, which contains the last capture group.

See Regexp for details.

Note that \\ is interpreted as an escape, i.e., a single backslash.

Note also that a string literal consumes backslashes. See String Literals for details about string literals.

A back-reference is typically preceded by an additional backslash. For example, if you want to write a back-reference \& in replacement with a double-quoted string literal, you need to write "..\\&..".

If you want to write a non-back-reference string \& in replacement, you need to first escape the backslash to prevent this method from interpreting it as a back-reference, and then you need to escape the backslashes again to prevent a string literal from consuming them: "..\\\\&..".

You may want to use the block form to avoid excessive backslashes.

Hash replacement

If the argument replacement is a hash, and pattern matches one of its keys, the replacing string is the value for that key:

h = {'foo' => 'bar', 'baz' => 'bat'}
'food'.sub('foo', h) # => "bard"

Note that a symbol key does not match:

h = {foo: 'bar', baz: 'bat'}
'food'.sub('foo', h) # => "d"

Block

In the block form, the current match string is passed to the block; the block’s return value becomes the replacing string:

s = '@'
'1234'.gsub(/\d/) { |match| s.succ! } # => "ABCD"

Special match variables such as $1, $2, $`, $&, and $' are set appropriately.

Whitespace in Strings

In the class String, whitespace is defined as a contiguous sequence of characters consisting of any mixture of the following:

  • NL (null): "\x00", "\u0000".

  • HT (horizontal tab): "\x09", "\t".

  • LF (line feed): "\x0a", "\n".

  • VT (vertical tab): "\x0b", "\v".

  • FF (form feed): "\x0c", "\f".

  • CR (carriage return): "\x0d", "\r".

  • SP (space): "\x20", " ".

Whitespace is relevant for the following methods:

What’s Here

First, what’s elsewhere. Class String:

Here, class String provides methods that are useful for:

Creating a String

  • ::new: Returns a new string.

  • ::try_convert: Returns a new string created from a given object.

Freezing/Unfreezing

  • +@: Returns a string that is not frozen: self if not frozen; self.dup otherwise.

  • -@ (aliased as dedup): Returns a string that is frozen: self if already frozen; self.freeze otherwise.

  • freeze: Freezes self if not already frozen; returns self.

Querying

Counts

  • bytesize: Returns the count of bytes.

  • count: Returns the count of substrings matching given strings.

  • empty?: Returns whether the length of self is zero.

  • length (aliased as size): Returns the count of characters (not bytes).

Substrings

  • =~: Returns the index of the first substring that matches a given Regexp or other object; returns nil if no match is found.

  • byteindex: Returns the byte index of the first occurrence of a given substring.

  • byterindex: Returns the byte index of the last occurrence of a given substring.

  • index: Returns the index of the first occurrence of a given substring; returns nil if none found.

  • rindex: Returns the index of the last occurrence of a given substring; returns nil if none found.

  • include?: Returns true if the string contains a given substring; false otherwise.

  • match: Returns a MatchData object if the string matches a given Regexp; nil otherwise.

  • match?: Returns true if the string matches a given Regexp; false otherwise.

  • start_with?: Returns true if the string begins with any of the given substrings.

  • end_with?: Returns true if the string ends with any of the given substrings.

Encodings

  • encoding: Returns the Encoding object that represents the encoding of the string.

  • unicode_normalized?: Returns true if the string is in Unicode normalized form; false otherwise.

  • valid_encoding?: Returns true if the string contains only characters that are valid for its encoding.

  • ascii_only?: Returns true if the string has only ASCII characters; false otherwise.

Other

  • sum: Returns a basic checksum for the string: the sum of each byte.

  • hash: Returns the integer hash code.

Comparing

  • == (aliased as ===): Returns true if a given other string has the same content as self.

  • eql?: Returns true if the content is the same as the given other string.

  • <=>: Returns -1, 0, or 1 as a given other string is smaller than, equal to, or larger than self.

  • casecmp: Ignoring case, returns -1, 0, or 1 as self is smaller than, equal to, or larger than a given other string.

  • casecmp?: Ignoring case, returns whether a given other string is equal to self.

Modifying

Each of these methods modifies self.

Insertion

  • insert: Returns self with a given string inserted at a specified offset.

  • <<: Returns self concatenated with a given string or integer.

  • append_as_bytes: Returns self concatenated with strings without performing any encoding validation or conversion.

  • prepend: Prefixes to self the concatenation of given other strings.

Substitution

  • bytesplice: Replaces bytes of self with bytes from a given string; returns self.

  • sub!: Replaces the first substring that matches a given pattern with a given replacement string; returns self if any changes, nil otherwise.

  • gsub!: Replaces each substring that matches a given pattern with a given replacement string; returns self if any changes, nil otherwise.

  • succ! (aliased as next!): Returns self modified to become its own successor.

  • replace: Returns self with its entire content replaced by a given string.

  • reverse!: Returns self with its characters in reverse order.

  • setbyte: Sets the byte at a given integer offset to a given value; returns the argument.

  • tr!: Replaces specified characters in self with specified replacement characters; returns self if any changes, nil otherwise.

  • tr_s!: Replaces specified characters in self with specified replacement characters, removing duplicates from the substrings that were modified; returns self if any changes, nil otherwise.

Casing

  • capitalize!: Upcases the initial character and downcases all others; returns self if any changes, nil otherwise.

  • downcase!: Downcases all characters; returns self if any changes, nil otherwise.

  • upcase!: Upcases all characters; returns self if any changes, nil otherwise.

  • swapcase!: Upcases each downcase character and downcases each upcase character; returns self if any changes, nil otherwise.

Encoding

  • encode!: Returns self with all characters transcoded from one encoding to another.

  • unicode_normalize!: Unicode-normalizes self; returns self.

  • scrub!: Replaces each invalid byte with a given character; returns self.

  • force_encoding: Changes the encoding to a given encoding; returns self.

Deletion

  • clear: Removes all content, so that self is empty; returns self.

  • slice!, []=: Removes a substring determined by a given index, start/length, range, regexp, or substring.

  • squeeze!: Removes contiguous duplicate characters; returns self.

  • delete!: Removes characters as determined by the intersection of substring arguments.

  • delete_prefix!: Removes leading prefix; returns self if any changes, nil otherwise.

  • delete_suffix!: Removes trailing suffix; returns self if any changes, nil otherwise.

  • lstrip!: Removes leading whitespace; returns self if any changes, nil otherwise.

  • rstrip!: Removes trailing whitespace; returns self if any changes, nil otherwise.

  • strip!: Removes leading and trailing whitespace; returns self if any changes, nil otherwise.

  • chomp!: Removes the trailing record separator, if found; returns self if any changes, nil otherwise.

  • chop!: Removes trailing newline characters if found; otherwise removes the last character; returns self if any changes, nil otherwise.

Converting to New String

Each of these methods returns a new String based on self, often just a modified copy of self.

Extension

  • *: Returns the concatenation of multiple copies of self.

  • +: Returns the concatenation of self and a given other string.

  • center: Returns a copy of self, centered by specified padding.

  • concat: Returns the concatenation of self with given other strings.

  • ljust: Returns a copy of self of a given length, right-padded with a given other string.

  • rjust: Returns a copy of self of a given length, left-padded with a given other string.

Encoding

  • b: Returns a copy of self with ASCII-8BIT encoding.

  • scrub: Returns a copy of self with each invalid byte replaced with a given character.

  • unicode_normalize: Returns a copy of self with each character Unicode-normalized.

  • encode: Returns a copy of self with all characters transcoded from one encoding to another.

Substitution

  • dump: Returns a printable version of self, enclosed in double-quotes.

  • undump: Inverse of dump; returns a copy of self with changes of the kinds made by dump “undone.”

  • sub: Returns a copy of self with the first substring matching a given pattern replaced with a given replacement string.

  • gsub: Returns a copy of self with each substring that matches a given pattern replaced with a given replacement string.

  • succ (aliased as next): Returns the string that is the successor to self.

  • reverse: Returns a copy of self with its characters in reverse order.

  • tr: Returns a copy of self with specified characters replaced with specified replacement characters.

  • tr_s: Returns a copy of self with specified characters replaced with specified replacement characters, removing duplicates from the substrings that were modified.

  • %: Returns the string resulting from formatting a given object into self.

Casing

  • capitalize: Returns a copy of self with the first character upcased and all other characters downcased.

  • downcase: Returns a copy of self with all characters downcased.

  • upcase: Returns a copy of self with all characters upcased.

  • swapcase: Returns a copy of self with all upcase characters downcased and all downcase characters upcased.

Deletion

  • delete: Returns a copy of self with characters removed.

  • delete_prefix: Returns a copy of self with a given prefix removed.

  • delete_suffix: Returns a copy of self with a given suffix removed.

  • lstrip: Returns a copy of self with leading whitespace removed.

  • rstrip: Returns a copy of self with trailing whitespace removed.

  • strip: Returns a copy of self with leading and trailing whitespace removed.

  • chomp: Returns a copy of self with a trailing record separator removed, if found.

  • chop: Returns a copy of self with trailing newline characters or the last character removed.

  • squeeze: Returns a copy of self with contiguous duplicate characters removed.

  • [] (aliased as slice): Returns a substring determined by a given index, start/length, range, regexp, or string.

  • byteslice: Returns a substring determined by a given index, start/length, or range.

  • chr: Returns the first character.

Duplication

  • to_s (aliased as to_str): If self is a subclass of String, returns self copied into a String; otherwise, returns self.

Converting to Non-String

Each of these methods converts the contents of self to a non-String.

Characters, Bytes, and Clusters

  • bytes: Returns an array of the bytes in self.

  • chars: Returns an array of the characters in self.

  • codepoints: Returns an array of the integer ordinals in self.

  • getbyte: Returns the integer byte at the given index in self.

  • grapheme_clusters: Returns an array of the grapheme clusters in self.

Splitting

  • lines: Returns an array of the lines in self, as determined by a given record separator.

  • partition: Returns a 3-element array determined by the first substring that matches a given substring or regexp.

  • rpartition: Returns a 3-element array determined by the last substring that matches a given substring or regexp.

  • split: Returns an array of substrings determined by a given delimiter – regexp or string – or, if a block is given, passes those substrings to the block.

Matching

  • scan: Returns an array of substrings matching a given regexp or string, or, if a block is given, passes each matching substring to the block.

  • unpack: Returns an array of substrings extracted from self according to a given format.

  • unpack1: Returns the first substring extracted from self according to a given format.

Numerics

  • hex: Returns the integer value of the leading characters, interpreted as hexadecimal digits.

  • oct: Returns the integer value of the leading characters, interpreted as octal digits.

  • ord: Returns the integer ordinal of the first character in self.

  • to_c: Returns the complex value of leading characters, interpreted as a complex number.

  • to_i: Returns the integer value of leading characters, interpreted as an integer.

  • to_f: Returns the floating-point value of leading characters, interpreted as a floating-point number.

  • to_r: Returns the rational value of leading characters, interpreted as a rational.

Strings and Symbols

  • inspect: Returns a copy of self, enclosed in double quotes, with special characters escaped.

  • intern (aliased as to_sym): Returns the symbol corresponding to self.

Iterating

  • each_byte: Calls the given block with each successive byte in self.

  • each_char: Calls the given block with each successive character in self.

  • each_codepoint: Calls the given block with each successive integer codepoint in self.

  • each_grapheme_cluster: Calls the given block with each successive grapheme cluster in self.

  • each_line: Calls the given block with each successive line in self, as determined by a given record separator.

  • upto: Calls the given block with each string value returned by successive calls to succ.

Ruby Core © 1993–2025 Yukihiro Matsumoto
Licensed under the Ruby License.
Ruby Standard Library © contributors
Licensed under their own licenses.