wchar 格式控制符_[C] wchar_t的格式控制字符(VC、BCB、GCC、C99标准)

本文详细探讨了C语言中处理wchar_t类型的字符串时,printf和wprintf函数的格式控制字符。通过查阅VC、BCB、GCC和C99标准的文档,了解到在不同编译器下,格式控制字符如'h'、'l'、's'、'S'等如何影响字符和字符串的输出。总结得出,'hs'用于printf输出char字符串,'ls'用于printf和wprintf输出wchar_t字符串。而在Linux环境下,GCC遵循C99标准,'s'代表char字符串,'ls'表示wchar_t字符串。
摘要由CSDN通过智能技术生成

随着wchar_t类型引入C语言,字符串处理变得越来越复杂。例如字符串输出有printf、wprintf这两个函数,当参数中既有char字符串又有wchar_t字符串时,该怎么填写格式控制字符呢?本文对此进行探讨。

一、翻阅文档

先翻阅一下各个编译器的文档及C99标准,看看它们对格式控制字符的说明。

1.1 VC的文档

在MSDN官网上,可以找到printf与wprintf的格式字符串的说明,在《Format Specification Fields: printf and wprintf Functions》(http://msdn.microsoft.com/en-us/library/56e442dc(v=vs.110).aspx)。摘录——

A format specification, which consists of optional and required fields, has the following form:

% [flags] [width] [.precision] [{h | l | ll | I | I32 | I64}]type

先点“type”查看类型,进入《printf Type Field Characters》页面(http://msdn.microsoft.com/en-us/library/hf4y5e3w(v=vs.110).aspx)。摘录——

printf Type Field Characters

Character

Type

Output format

c

int or wint_t

When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character.

C

int or wint_t

When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character.

s

String

When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are displayed up to the first null character or until the precision value is reached.

S

String

When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character string. Characters are displayed up to the first null character or until the precision value is reached.

To specify

Use prefix

With type specifier

Single-byte character with printf functions

h

c or C

Single-byte character with wprintf functions

h

c or C

Wide character with printf functions

l

c or C

Wide character with wprintf functions

l

c or C

Single-byte – character string with printf functions

h

s or S

Single-byte – character string with wprintf functions

h

s or S

Wide-character string with printf functions

l

s or S

Wide-character string with wprintf functions

l

s or S

Wide character

w

c

Wide-character string

w

s

Thus to print single-byte or wide-characters with printf functions and wprintf functions, use format specifiers as follows.

To print character as

Use function

With format specifier

single byte

printf

c, hc, or hC

single byte

wprintf

C, hc, or hC

wide

wprintf

c, lc, lC, or wc

wide

printf

C, lc, lC, or wc

To print strings with printf functions and wprintf functions, use the prefixes h and l analogously with format type-specifiers s and S.

上面介绍了很多控制字符。整理一下,发现对字符串来说,最有用的是这三个——

hs:printf、wprintf均是char字符串。

ls:printf、wprintf均是wchar_t字符串。

s:printf是char字符串,而wprintf是wchar_t字符串。与TCHAR搭配使用很方便。

1.2 BCB的文档

打开BCB6帮助文件中的“C Runtime Library Reference”,在索引中输入“printf”,能很快找到格式控制字符的说明——

4b8ca4cec1123bd1e5a907bb2faf5251.png

bebef53e4f6e513d76a6e068f989f2a1.png

观察后可发现,它与VC是兼容的。可以使用hs/ls/s分别处理char/wchar_t/TCHAR字符串。

1.3 GCC的文档

我这里装了Fedora 17,并装好了GCC 4.7.0。

打开控制台,输入“man 3 wprintf”查看wprintf函数的文档。摘录——

c

If no l modifier is present, the int argument is converted to a wide character by a call to the btowc(3) function, and the resulting wide character is written. If an l modifier is present, the wint_t (wide character) argument is written.

s

If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string) containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted to wide characters (each by a call to the mbrtowc(3) function with a conversion state starting in the initial state before the first byte). The resulting wide characters are written up to (but not including) the terminating null wide character. If a precision is specified, no more wide characters than the number specified are written. Note that the precision determines the number of wide characters written, not the number of bytes or screen positions. The array must contain a terminating null byte, unless a precision is given and it is so small that the number of converted wide characters reaches it before the end of the array is reached.

If an l modifier is present: The const wchar_t * argument is expected to be a pointer to an array of wide characters. Wide characters from the array are written up to (but not including) a terminating null wide character. If a precision is specified, no more than the number specified are written. The array must contain a terminating null wide character, unless a precision is given and it is smaller than or equal to the number of wide characters in the array.

根据上面的描述,GCC似乎只支持这两种字符串的格式控制字符——

s:printf、wprintf均是char字符串。

ls:printf、wprintf均是wchar_t字符串。

1.4 C99标准

在C99标准的“7.24.2.1 The fwprintf function”中介绍了fwprintf等宽字符函数的格式控制字符。摘录——

7 The length modifiers and their meanings are:

h

Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.

l (ell)

Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; that a following c conversion specifier applies to a wint_t argument; that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.

……

8 The conversion specifiers and their meanings are:

c

If no l length modifier is present, the int argument is converted to a wide character as if by calling btowc and the resulting wide character is written.

If an l length modifier is present, the wint_t argument is converted to wchar_t and written.

s

If no l length modifier is present, the argument shall be a pointer to the initial element of a character array containing a multibyte character sequence beginning in the initial shift state. Characters from the array are converted as if by repeated calls to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted, and written up to (but not including) the terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the converted array, the converted array shall contain a null wide character.

If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are written up to (but not including) a terminating null wide character. If the precision is specified, no more than that many wide characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null wide character.

可见,C99标准中c、s仅有“l”长度修正,没“l”的是char字符串,有“l”的是wchar_t字符串。

1.5 小结

根据上面的资料,可以整理出一份表格——

VC和BCB

GCC和C99标准

printf

wprintf

printf

wprintf

s

char

wchar_t

char

char

S

wchar_t

char

*

*

hs

char

char

*

*

ls

wchar_t

wchar_t

wchar_t

wchar_t

*:未定义。

二、测试程序

参考了上述文档,我觉的应该编写一个测试程序,实际测一下各个编译器对wchar_t格式控制字符的支持性。

测试程序的代码如下——

#include #include#include#include

char* psa = "CHAR"; //单字节字符串.

wchar_t* psw = L"WCHAR"; //宽字符串.

wchar_t* pst = L"TCHAR"; //类型与printf/wprintf匹配的字符串.

intmain()

{

setlocale(LC_ALL,""); //使用系统当前代码页.//test

wprintf(L"A:\t%hs\n", psa);

wprintf(L"W:\t%ls\n", psw);

wprintf(L"T:\t%s\n", pst);return 0;

}

如果运行正常的话,该程序的输出结果应该是——

A: CHAR

W: WCHAR

T: TCHAR

三、测试结果

3.1 VC6与BCB6测试

跟意料中的一样,VC6与BCB6均正确输出了——

A: CHAR

W: WCHAR

T: TCHAR

3.2 fedora中的GCC测试

Fedora 17,GCC 4.7.0——

4918a8d34afec5709b01a3a147afc12f.png

第3项的输出结果有误是很容易理解的。因为GCC文档与C99标准都规定“无l时的s代表char字符串”,而pst实际上是一个wchar_t字符串。

而第1项正确的输出结果反倒有点迷惑——GCC文档和C99标准中s不是没有“h”长度修正吗。想了一下才明白,文档上说的是“无l时的s代表char字符串”,因“hs”没有“l”,所以被识别为char字符串也是符合标准。

3.3 mingw中的GCC测试

MinGW(20120426),GCC 4.6.2——

8767ae9dc506903ce01585000c0e4372.png

MinGW虽然用的也是GCC编译器,但为了兼容Windows环境,它调整了格式控制字符规则,与VC保持一致。

四、总结

根据上面的测试结果,修订前面的表格——

VC、BCB、MinGW

Linux下的GCC、C99标准

printf

wprintf

printf

wprintf

s

char

wchar_t

char

char

S

wchar_t

char

*

*

hs

char

char

char

char

ls

wchar_t

wchar_t

wchar_t

wchar_t

总结如下——

1) 需要输出char字符串时,使用“hs”。

2) 需要输出wchar_t字符串时,使用“ls”。

3) 需要输出TCHAR字符串时,使用“s”,仅对VC、BCB、MinGW等Windows平台的编译器有效。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值