分析C++ Command-line参数

预注:命令行(commandline)被操作系统的命令分析器(/往后简称cmdlineparser)分解到命令参数argv[0]…[n],这里,commandline是入料,argv是出品.

Microsoft C/C++ 程序引导代码使用以下规则解析操作系统命令行中给出的参数:

  • cmdlineparser用空白字符从commandline中分隔出argv;空白字符可以是一个空格(0x20)或制表符(0x09);注意,空白字符不一定就分割了argv,因为空白字符可能是argv的一部分
  • 相比0x20和0x09,字符^(0x5E) 未被识别为转义符或者分隔符;出品argv之前,commandline由cmdlineparser完全处理
  • commandline中,双引号括起来的字符串"string"被解释为单个参数,即使其中包含空格0x20,譬如"a string",解析为a string; 带引号的字符串可以嵌入在参数内,譬如d"e f"g,将被cmdlineparser解析为de fg
  • commandline中,前面有反斜杠(0x5C)的双引号 (\") 被解释为argv中的双引号字符 (")
  • 承4.,反斜杠在argv中按其原义解释,除非它们紧位于双引号之前
  • commandline中,如果偶数个反斜杠后跟一个双引号,每对反斜杠将被cmdlineparser解析为argv中的一个反斜杠;而紧跟后面的那个双引号将被cmdlineparser当作分隔符,等价于commandline中的空白字符
  • commandline中,如果奇数个反斜杠后跟一个双引号,每对反斜杠将被cmdlineparser解析为argv中的一个反斜杠;剩下的反斜杠+双引号按4.被转义解释为双引号

以上这段文字翻译自http://msdn.microsoft.com/en-us/library/17w5ykft.aspx ,主要还是本人理解的语义。原文如下

Microsoft C/C++ startup code uses the following rules when interpreting arguments given on the operating system command line:

  • Arguments are delimited by white space, which is either a space or a tab.
  • The caret character (^) is not recognized as an escape character or delimiter. The character is handled completely by the command-line parser in the operating system before being passed to the argv array in the program.
  • A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument.
  • A double quotation mark preceded by a backslash (\") is interpreted as a literal double quotation mark character (").
  • Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
  • If an even number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is interpreted as a string delimiter.
  • If an odd number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for every pair of backslashes, and the double quotation mark is "escaped" by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.

示例

下面的过程演示如何通过命令行参数:


   
// command_line_arguments.cpp // compile with: /EHsc #include < iostream > using namespace std; int main( int argc, // Number of strings in array argv char * argv[], // Array of command-line argument strings char * envp[] ) // Array of environment variable strings { int count; // Display each command-line argument. cout << " \nCommand-line arguments:\n " ; for ( count = 0 ; count < argc; count ++ ) cout << " argv[ " << count << " ] " << argv[count] << " \n " ; }

下表显示示例输入,并预期的输出,演示上面的规则列表

命令行输入       |   argv [1]  |   argv [2]   |   argv [3]
-----------------|-------------|--------------|---------------
"abc" d e        |   abc       |   d          |   e
a\\b d"e f"g h   |   a\\b      |   de fg      |   h
a\\\"b c d       |   a\"b      |   c          |   d
a\\\\"b c" d e   |   a\\b c    |   d          |   e

/

又:

有关连在一起的多个双引号的解析,非常狗血,请参考讨论

尤其是 http://www.daviddeley.com/autohotkey/parameters/parameters.htm 中的这个补充说明:

  • And here's the missing undocumented rule:
    If a closing " is followed immediately by another ", the 2nd " is accepted literally and added to the parameter.

及其算法:

5.10  The Microsoft C/C++ Command Line Parameter Parsing Algorithm
The following algorithm was reverse engineered by disassembling a small C program compiled using Microsoft Visual C++ and examining the disassembled code:

1. Parse off parameter 0 (the program filename)
    * The entire parameter may be enclosed in double quotes (it handles double quoted parts)
      (Double quotes are necessary if there are any spaces or tabs in the parameter)
    * There is no special processing of backslashes (\)

2. Parse off next parameter:
    a. Skip over multiple spaces/tabs between parameters
      LOOP
    b. Count the backslashes (\). Let m = number of backslashes. (m may be zero.)
    c. IF next character following m backslashes is a double quote:
           If m is even (or zero)
                if currently in a double quoted part
                   IF next character is also a "
                        move to next character (the 2nd ". This character will be added to the parameter.)
                   ELSE
                        set flag to not add this " character to the parameter
                   ENDIF
                    toggle double quoted part flag
               else
                    set flag to not add this " character to the parameter
               endif
           Endif
            m = m/2 (floor divide e.g. 0/2=0, 1/2=0, 2/2=1, 3/2=1, 4/2=2, 5/2=2, etc.)
       ENDIF
    d. add m backslashes
    e. add this character to our parameter
      ENDLOOP

转载于:https://my.oschina.net/jacobin/blog/153257

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值