' c-char ' | (1) | |
u8' c-char ' | (2) | (since C23) |
u' c-char ' | (3) | (since C11) |
U' c-char ' | (4) | (since C11) |
L' c-char ' | (5) | |
' c-char-sequence ' | (6) | |
L' c-char-sequence ' | (7) | |
u' c-char-sequence ' | (8) | (since C11)(removed in C23) |
U' c-char-sequence ' | (9) | (since C11)(removed in C23) |
where.
'
), backslash (\
), or the newline character. \'
\"
\?
\\
\a
\b
\f
\n
\r
\t
\v
, hex escapes \x...
or octal escapes \...
as defined in escape sequences.
| (since C99) |
'a'
or '\n'
or '\13'
. Such constant has type int
and a value equal to the representation of c-char in the execution character set as a value of type char
mapped to int
. If c-char is not representable as a single byte in the execution character set, the value is implementation-defined.u8'a'
. Such constant has type unsigned char
and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit (that is, c-char is in the range 0x0-0x7F, inclusive). If c-char is not representable with a single UTF-8 code unit, the program is ill-formed. 3) 16-bit wide character constant, e.g. u'่ฒ' , but not u'๐' (u'\U0001f34c' ). Such constant has type char16_t and a value equal to the value of c-char in the 16-bit encoding produced by mbrtoc16 (normally UTF-16). If c-char is not representable or maps to more than one 16-bit character, the value is implementation-defined. 4) 32-bit wide character constant, e.g. U'่ฒ' or U'๐' . Such constant has type char32_t and a value equal to the value of c-char in in the 32-bit encoding produced by mbrtoc32 (normally UTF-32). If c-char is not representable or maps to more than one 32-bit character, the value is implementation-defined. | (until C23) |
3) UTF-16 character constant, e.g. u'่ฒ' , but not u'๐' (u'\U0001f34c' ). Such constant has type char16_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-16 code unit (that is, c-char is in the range 0x0-0xD7FF or 0xE000-0xFFFF, inclusive). If c-char is not representable with a single UTF-16 code unit, the program is ill-formed. 4) UTF-32 character constant, e.g. U'่ฒ' or U'๐' . Such constant has type char32_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-32 code unit (that is, c-char is in the range 0x0-0xD7FF or 0xE000-0x10FFFF, inclusive). If c-char is not representable with a single UTF-32 code unit, the program is ill-formed. | (since C23) |
L'ฮฒ'
or L'่ฒ
. Such constant has type wchar_t
and a value equal to the value of c-char in the execution wide character set (that is, the value that would be produced by mbtowc
). If c-char is not representable or maps to more than one wide character (e.g. a non-BMP value on Windows where wchar_t
is 16-bit), the value is implementation-defined .'AB'
, has type int
and implementation-defined value.L'AB'
, has type wchar_t
and implementation-defined value.u'CD'
, has type char16_t
and implementation-defined value.U'XY'
, has type char32_t
and implementation-defined value.Multicharacter constants were inherited by C from the B programming language. Although not specified by the C standard, most compilers (MSVC is a notable exception) implement multicharacter constants as specified in B: the values of each char in the constant initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1'
is 0x00000001
and the value of '\1\2\3\4'
is 0x01020304
.
In C++, encodable ordinary character literals have type char
, rather than int
.
Unlike integer constants, a character constant may have a negative value if char
is signed: on such implementations '\xFF'
is an int
with the value -1
.
When used in a controlling expression of #if
or #elif
, character constants may be interpreted in terms of the source character set, the execution character set, or some other implementation-defined character set.
16/32-bit multicharacter constants are not widely supported and removed in C23. Some common implementations (e.g. clang) do not accept them at all.
#include <stddef.h> #include <stdio.h> #include <uchar.h> int main (void) { printf("constant value \n"); printf("-------- ----------\n"); // integer character constants, int c1='a'; printf("'a':\t %#010x\n", c1); int c2='๐'; printf("'๐':\t %#010x\n\n", c2); // implementation-defined // multicharacter constant int c3='ab'; printf("'ab':\t %#010x\n\n", c3); // implementation-defined // 16-bit wide character constants char16_t uc1 = u'a'; printf("'a':\t %#010x\n", (int)uc1); char16_t uc2 = u'ยข'; printf("'ยข':\t %#010x\n", (int)uc2); char16_t uc3 = u'็ซ'; printf("'็ซ':\t %#010x\n", (int)uc3); // implementation-defined (๐ maps to two 16-bit characters) char16_t uc4 = u'๐'; printf("'๐':\t %#010x\n\n", (int)uc4); // 32-bit wide character constants char32_t Uc1 = U'a'; printf("'a':\t %#010x\n", (int)Uc1); char32_t Uc2 = U'ยข'; printf("'ยข':\t %#010x\n", (int)Uc2); char32_t Uc3 = U'็ซ'; printf("'็ซ':\t %#010x\n", (int)Uc3); char32_t Uc4 = U'๐'; printf("'๐':\t %#010x\n\n", (int)Uc4); // wide character constants wchar_t wc1 = L'a'; printf("'a':\t %#010x\n", (int)wc1); wchar_t wc2 = L'ยข'; printf("'ยข':\t %#010x\n", (int)wc2); wchar_t wc3 = L'็ซ'; printf("'็ซ':\t %#010x\n", (int)wc3); wchar_t wc4 = L'๐'; printf("'๐':\t %#010x\n\n", (int)wc4); }
Possible output:
constant value -------- ---------- 'a': 0x00000061 '๐': 0xf09f8d8c 'ab': 0x00006162 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0000df4c 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0001f34c 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0001f34c
C++ documentation for Character literal |
ยฉ cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
https://en.cppreference.com/w/c/language/character_constant