The following example returns 65
, the Unicode value for A.
charCodeAt()
may return lone surrogates, which are not valid Unicode characters.
const str = "𠮷𠮾";
console.log(str.charCodeAt(0));
console.log(str.charCodeAt(1));
To get the full Unicode code point at the given index, use String.prototype.codePointAt()
.
const str = "𠮷𠮾";
console.log(str.codePointAt(0));
Note: Avoid re-implementing codePointAt()
using charCodeAt()
. The translation from UTF-16 surrogates to Unicode code points is complex, and codePointAt()
may be more performant as it directly uses the internal representation of the string. Install a polyfill for codePointAt()
if necessary.
Below is a possible algorithm to convert a pair of UTF-16 code units into a Unicode code point, adapted from the Unicode FAQ:
const LEAD_OFFSET = 0xd800 - (0x10000 >> 10);
const SURROGATE_OFFSET = 0x10000 - (0xd800 << 10) - 0xdc00;
function utf16ToUnicode(lead, trail) {
return (lead << 10) + trail + SURROGATE_OFFSET;
}
function unicodeToUTF16(codePoint) {
const lead = LEAD_OFFSET + (codePoint >> 10);
const trail = 0xdc00 + (codePoint & 0x3ff);
return [lead, trail];
}
const str = "𠮷";
console.log(utf16ToUnicode(str.charCodeAt(0), str.charCodeAt(1)));
console.log(str.codePointAt(0));