A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization.
The Unicode version that is supported by the implementation
The default normalization used for operations that require normalization. It can be set to any of the normalizations in NORMALIZATION_FORMS.
ActiveSupport::Multibyte::Unicode.default_normalization_form = :c
# File activesupport/lib/active_support/multibyte/unicode.rb, line 67
def compose(codepoints)
codepoints.pack("U*").unicode_normalize(:nfc).codepoints
end Compose decomposed characters to the composed form.
# File activesupport/lib/active_support/multibyte/unicode.rb, line 58
def decompose(type, codepoints)
if type == :compatibility
codepoints.pack("U*").unicode_normalize(:nfkd).codepoints
else
codepoints.pack("U*").unicode_normalize(:nfd).codepoints
end
end Decompose composed characters to the decomposed form.
# File activesupport/lib/active_support/multibyte/unicode.rb, line 118
def normalize(string, form = nil)
form ||= @default_normalization_form
# See https://www.unicode.org/reports/tr15, Table 1
if alias_form = NORMALIZATION_FORM_ALIASES[form]
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
removed from Rails 6.1. Use String#unicode_normalize(:#{alias_form}) instead.
MSG
string.unicode_normalize(alias_form)
else
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
removed from Rails 6.1. Use String#unicode_normalize instead.
MSG
raise ArgumentError, "#{form} is not a valid normalization variant", caller
end
end Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.
string - The string to perform normalization on.
form - The form you want to normalize in. Should be one of the following: :c, :kc, :d, or :kd. Default is #default_normalization_form.
# File activesupport/lib/active_support/multibyte/unicode.rb, line 48
def pack_graphemes(unpacked)
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#pack_graphemes is deprecated and will be
removed from Rails 6.1. Use array.flatten.pack("U*") instead.
MSG
unpacked.flatten.pack("U*")
end Reverse operation of unpack_graphemes.
Unicode.pack_graphemes(Unicode.unpack_graphemes('क्षि')) # => 'क्षि'
# File activesupport/lib/active_support/multibyte/unicode.rb, line 78
def tidy_bytes(string, force = false)
return string if string.empty?
return recode_windows1252_chars(string) if force
string.scrub { |bad| recode_windows1252_chars(bad) }
end Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.
Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.
# File activesupport/lib/active_support/multibyte/unicode.rb, line 36
def unpack_graphemes(string)
ActiveSupport::Deprecation.warn(<<-MSG.squish)
ActiveSupport::Multibyte::Unicode#unpack_graphemes is deprecated and will be
removed from Rails 6.1. Use string.scan(/\X/).map(&:codepoints) instead.
MSG
string.scan(/\X/).map(&:codepoints)
end Unpack the string at grapheme boundaries. Returns a list of character lists.
Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]
© 2004–2019 David Heinemeier Hansson
Licensed under the MIT License.