字符常量和字符串常量_使用多字符常量的危险

字符常量和字符串常量

Picture 1

During code analysis, PVS-Studio analyzes the data flow and operates variable values. Values are taken from constants or derived from conditional expressions. We call them virtual values. Recently, we have refined them in order to work with multi-character constants and this has become the reason to create a new diagnostic rule.

在代码分析期间,PVS-Studio分析数据流并操作变量值。 值取自常量或条件表达式。 我们称它们为虚拟价值。 最近,我们对它们进行了精炼,以便使用多字符常量,这已成为创建新诊断规则的原因。

介绍 (Introduction)

Multi-character-literals are implementation-defined, so different compilers can encode them in different ways. For example, GCC and Clang set a value, based on the order of the symbols in the literal, while MSVC moves them depending on the symbol's type (regular or escape).

多字符字面量是实现定义的 ,因此不同的编译器可以用不同的方式对其进行编码。 例如,GCC和Clang根据文字中符号的顺序设置一个值,而MSVC根据符号的类型(常规或转义)移动它们。

For example, the 'T\x65s\x74' literal will be encoded in different of ways, depending on the compiler. A similar logic had to be added in the analyzer. As a result, we've made a new diagnostic rule V1039 to identify such literals in the code. These literals are dangerous in cross-platform projects that use multiple compilers for building.

例如,根据编译器的不同,'T \ x65s \ x74'文字将以不同的方式进行编码。 分析仪中必须添加类似的逻辑。 结果,我们创建了一个新的诊断规则V1039来识别代码中的此类文字。 这些文字在使用多个编译器进行构建的跨平台项目中很危险。

诊断V1039 (Diagnostic V1039)

Let's look at the example. The code below, compiled by different compilers, will behave differently:

让我们来看一个例子。 下面的代码由不同的编译器编译,其行为将有所不同:

#include <stdio.h>

void foo(int c)
{
  if (c == 'T\x65s\x74')                       // <= V1039
  {
    printf("Compiled with GCC or Clang.\n");
  }
  else
  {
    printf("It's another compiler (for example, MSVC).\n");
  }
}

int main(int argc, char** argv)
{
  foo('Test');
  return 0;
}

The program, compiled by different compilers, will print different messages on the screen.

该程序由不同的编译器编译,将在屏幕上打印不同的消息。

For a project that uses a specific compiler, it won't be noticeable. But when porting, problems may occur, so one should replace such literals with simple numerical constants, such as 'Test' is to be changed with 0x54657374.

对于使用特定编译器的项目,它不会引起注意。 但是在移植时,可能会出现问题,因此应使用简单的数字常量替换此类文字,例如将“ Test”更改为0x54657374。

To demonstrate the difference between compilers, we'll write a small utility that takes sequences of 3 and 4 symbols, such as 'GHIJ' and 'GHI', and displays their representation in memory after compilation.

为了演示编译器之间的区别,我们将编写一个小的实用程序,该程序使用3和4个符号的序列,例如'GHIJ'和'GHI',并在编译后在内存中显示它们的表示形式。

Utility code:

实用代码:

#include <stdio.h>

typedef int char_t;

void PrintBytes(const char* format, char_t lit)
{
  printf("%20s : ", format);

  const unsigned char *ptr = (const unsigned char*)&lit;
  for (int i = sizeof(lit); i--;)
  {
    printf("%c", *ptr++);
  }
  putchar('\n');
}

int main(int argc, char** argv)
{
  printf("Hex codes are: G(%02X) H(%02X) I(%02X) J(%02X)\n",'G','H','I','J');
  PrintBytes("'GHIJ'", 'GHIJ');
  PrintBytes("'\\x47\\x48\\x49\\x4A'", '\x47\x48\x49\x4A');
  PrintBytes("'G\\x48\\x49\\x4A'", 'G\x48\x49\x4A');
  PrintBytes("'GH\\x49\\x4A'", 'GH\x49\x4A');
  PrintBytes("'G\\x48I\\x4A'", 'G\x48I\x4A');
  PrintBytes("'GHI\\x4A'", 'GHI\x4A');
  PrintBytes("'GHI'", 'GHI');
  PrintBytes("'\\x47\\x48\\x49'", '\x47\x48\x49');
  PrintBytes("'GH\\x49'", 'GH\x49');
  PrintBytes("'\\x47H\\x49'", '\x47H\x49');
  PrintBytes("'\\x47HI'", '\x47HI');
  return 0;
}

Output of the utility, compiled by Visual C++:

该实用程序的输出,由Visual C ++编译:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : GHIJ
     'G\x48\x49\x4A' : HGIJ
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : GHI
            'GH\x49' : IHG
         '\x47H\x49' : HGI
            '\x47HI' : IHG

Output of the utility, compiled by GCC or Clang:

该实用程序的输出,由GCC或Clang编译:

Hex codes are: G(47) H(48) I(49) J(4A)
              'GHIJ' : JIHG
  '\x47\x48\x49\x4A' : JIHG
     'G\x48\x49\x4A' : JIHG
        'GH\x49\x4A' : JIHG
        'G\x48I\x4A' : JIHG
           'GHI\x4A' : JIHG
               'GHI' : IHG
      '\x47\x48\x49' : IHG
            'GH\x49' : IHG
         '\x47H\x49' : IHG
            '\x47HI' : IHG

结论 (Conclusion)

The V1039 diagnostic is added in the PVS-Studio analyzer of

V1039诊断程序已添加到以下设备的PVS-Studio分析仪中:

7.03 (7.03 )

version

(, )

which has been recently released. You can download the latest version of the analyzer on the

最近已发布。 您可以在以下位置下载最新版本的分析仪

download page. 下载页面

翻译自: https://habr.com/en/company/pvs-studio/blog/457694/

字符常量和字符串常量

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值