第一章字符串转换成整数

最新推荐文章于 2023-03-27 10:35:36 发布

和几米的约定

最新推荐文章于 2023-03-27 10:35:36 发布

阅读量966

点赞数

分类专栏： c语言文章标签： C 语言

c语言专栏收录该内容

2 篇文章 0 订阅

订阅专栏

第三十章、字符串转换成整数

先看题目：

输入一个表示整数的字符串，把该字符串转换成整数并输出，例如输入字符串"345"，则输出整数345。
给定函数原型int StrToInt(const char *str) ，完成函数StrToInt，实现字符串转换成整数的功能，不得用库函数atoi（即便准许使用，其对于溢出情况的处理也达不到题目的要求，详情请参看下文第7节末）。

我们来一步一步分析（共9小节，重点在下文第8小节及后续内容），直至写出第一份准确的代码：

1、本题考查的实际上就是字符串转换成整数的问题，或者说是要你自行实现atoi函数。那如何实现把表示整数的字符串正确地转换成整数呢？以"345"作为例子：

当我们扫描到字符串的第一个字符'3'时，由于我们知道这是第一位，所以得到数字3。
当扫描到第二个数字'4'时，而之前我们知道前面有一个3，所以便在后面加上一个数字4，那前面的3相当于30，因此得到数字：3*10+4=34。
继续扫描到字符'5'，'5'的前面已经有了34，由于前面的34相当于340，加上后面扫描到的5，最终得到的数是：34*10+5=345。

因此，此题的思路便是：每扫描到一个字符，我们便把在之前得到的数字乘以10，然后再加上当前字符表示的数字。

2、思路有了，有一些细节需要注意，如zhedahht所说：

“由于整数可能不仅仅之含有数字，还有可能以'+'或者'-'开头，表示整数的正负。因此我们需要把这个字符串的第一个字符做特殊处理。如果第一个字符是'+'号，则不需要做任何操作；如果第一个字符是'-'号，则表明这个整数是个负数，在最后的时候我们要把得到的数值变成负数。
接着我们试着处理非法输入。由于输入的是指针，在使用指针之前，我们要做的第一件是判断这个指针是不是为空。如果试着去访问空指针，将不可避免地导致程序崩溃。
另外，输入的字符串中可能含有不是数字的字符。每当碰到这些非法的字符，我们就没有必要再继续转换。
最后一个需要考虑的问题是溢出问题。由于输入的数字是以字符串的形式输入，因此有可能输入一个很大的数字转换之后会超过能够表示的最大的整数而溢出。”

比如，当给的字符串是如左边图片所示的时候，有考虑到么？当然，它们各自对应的正确输出如右边图片所示（假定你是在32位系统下，且编译环境是VS2008以上）：

3、很快，可能你就会写下如下代码：

[cpp] view plain copy print ?

//copyright@zhedahht 2007
enum Status {kValid = 0, kInvalid};
int g_nStatus = kValid;
// Convert a string into an integer
int StrToInt(const char* str)
{
g_nStatus = kInvalid;
long long num = 0;
if(str != NULL)
{
const char* digit = str;
// the first char in the string maybe '+' or '-'
bool minus = false;
if(*digit == '+')
digit ++;
else if(*digit == '-')
{
digit ++;
minus = true;
}
// the remaining chars in the string
while(*digit != '\0')
{
if(*digit >= '0' && *digit <= '9')
{
num = num * 10 + (*digit - '0');
// overflow
if(num > std::numeric_limits<int>::max())
{
num = 0;
break;
}
digit ++;
}
// if the char is not a digit, invalid input
else
{
num = 0;
break;
}
}
if(*digit == '\0')
{
g_nStatus = kValid;
if(minus)
num = 0 - num;
}
}
return static_cast<int>(num);
}

//copyright@zhedahht 2007  
enum Status {kValid = 0, kInvalid};
int g_nStatus = kValid;

// Convert a string into an integer
int StrToInt(const char* str)
{
	g_nStatus = kInvalid;
	long long num = 0;

	if(str != NULL)
	{
		const char* digit = str;

		// the first char in the string maybe '+' or '-'
		bool minus = false;
		if(*digit == '+')
			digit ++;
		else if(*digit == '-')
		{
			digit ++;
			minus = true;
		}

		// the remaining chars in the string
		while(*digit != '\0')
		{
			if(*digit >= '0' && *digit <= '9')
			{
				num = num * 10 + (*digit - '0');

				// overflow  
				if(num > std::numeric_limits<int>::max())
				{
					num = 0;
					break;
				}

				digit ++;
			}
			// if the char is not a digit, invalid input
			else
			{
				num = 0;
				break;
			}
		}

		if(*digit == '\0')
		{
			g_nStatus = kValid;
			if(minus)
				num = 0 - num;
		}
	}
	return static_cast<int>(num);
}

run下上述程序，会发现当输入字符串是下图中红叉叉部分所对应的时候，程序结果出错：

两个问题：

当输入的字符串不是数字，而是字符的时候，比如“1a”，上述程序直接返回了0（而正确的结果应该是得到1）：
[cpp] view plain copy print ?
1. // if the char is not a digit, invalid input
2. else
3. {
4. num = 0;
5. break;
6. }
```
// if the char is not a digit, invalid input
				  else
				  {
					  num = 0;
					  break;
				  }
```
处理溢出时，有问题。因为它遇到溢出情况时，直接返回了0：
[cpp] view plain copy print ?
1. // overflow
2. if(num > std::numeric_limits<int>::max())
3. {
4. num = 0;
5. break;
6. }
```
// overflow  
				if(num > std::numeric_limits<int>::max())
				{
					num = 0;
					break;
				}
```

4、把代码做下微调，如下（注：库函数atoi规定超过int值，按最大值maxint：2147483647来，超过-int按最小值minint：-2147483648来）：

[cpp] view plain copy print ?

//copyright@SP_daiyq 2013/5/29
int StrToInt(const char* str)
{
int res = 0; // result
int i = 0; // index of str
int signal = '+'; // signal '+' or '-'
int cur; // current digit
if (!str)
return 0;
// skip backspace
while (isspace(str[i]))
i++;
// skip signal
if (str[i] == '+' || str[i] == '-')
{
signal = str[i];
i++;
}
// get result
while (str[i] >= '0' && str[i] <= '9')
{
cur = str[i] - '0';
// judge overlap or not
if ( (signal == '+') && (cur > INT_MAX - res*10) )
{
res = INT_MAX;
break;
}
else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
{
res = INT_MIN;
break;
}
res = res * 10 + cur;
i++;
}
return (signal == '-') ? -res : res;
}

//copyright@SP_daiyq 2013/5/29
int StrToInt(const char* str)
{
	int res = 0; // result
	int i = 0; // index of str
	int signal = '+'; // signal '+' or '-'
	int cur; // current digit

	if (!str)
		return 0;

	// skip backspace
	while (isspace(str[i]))
		i++;

	// skip signal
	if (str[i] == '+' || str[i] == '-')
	{
		signal = str[i];
		i++;
	}

	// get result
	while (str[i] >= '0' && str[i] <= '9')
	{
		cur = str[i] - '0';

		// judge overlap or not
		if ( (signal == '+') && (cur > INT_MAX - res*10) )
		{
			res = INT_MAX;
			break;
		}
		else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
		{
			res = INT_MIN;
			break;
		}

		res = res * 10 + cur;
		i++;
	}

	return (signal == '-') ? -res : res;
}

此时会发现，上面第3小节末所述的第1个小问题（当输入的字符串不是数字，而是字符的时候）解决了：

但，上文第3小节末所述的第2个小问题：溢出问题却没有解决。即当给定下述测试数据的时候，问题就来了：

需要转换的字符串代码运行结果理应得到的正确结果

什么问题呢？比如说用上述代码转换这个字符串：" 10522545459"，它本应得到的正确结果应该是2147483647，但程序实际得到的结果却是：1932610867。故很明显，程序没有解决好上面的第2个小问题：溢出问题。原因是什么呢？咱们来分析下代码，看是如何具体处理溢出情况的：

[cpp] view plain copy print ?

// judge overlap or not
if ( (signal == '+') && (cur > INT_MAX - res*10) )
{
res = INT_MAX;
break;
}
else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
{
res = INT_MIN;
break;
}

// judge overlap or not
		if ( (signal == '+') && (cur > INT_MAX - res*10) )
		{
			res = INT_MAX;
			break;
		}
		else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
		{
			res = INT_MIN;
			break;
		}

接着上面的例子来，比如给定字符串" 10522545459"，除去空格有11位，而MAX_INT，即2147483647是10位数，当扫描到最后一个字符‘9’的时候，程序会比较 9 和 2147483647 - 1052254545*10的大小。

问题立马就暴露出来了，因为此时让res*10，即让1052254545*10 > MAX_INT，溢出无疑，程序已经出错，再执行下面这行代码已无意义：

[cpp] view plain copy print ?

cur > INT_MAX - res*10

cur > INT_MAX - res*10

也就是说，对于字符串"10522545459", 当扫描到最后一个字符‘9’时，根据上文第1小节的字符串转换成整数的思路：“每扫描到一个字符，我们便把在之前得到的数字乘以10，然后再加上当前字符表示的数字”，为了得到最终的整数，我们得如此计算：

1052254545*10 + 4，

然实际上当程序计算到1052254545*10时，

1052254545*10 >

2147483647

此时已经溢出了,若再执意计算，则程序逻辑将出错，故此后也就不能再判断字串的最后一位4是否大于2147483647%10了（耐不得烦想尽快看到最终正确代码的读者可以直接跳到下文第8节）。

5、上面说给的程序没有“ 很好的解决溢出问题。由于输入的数字是以字符串的形式输入，因此有可能输入一个很大的数字转换之后会超过能够表示的最大的整数而溢出”。那么，到底代码该如何写呢？

像下面这样？：

[cpp] view plain copy print ?

//copyright@fuwutu 2013/5/29
int StrToInt(const char* str)
{
bool negative = false;
long long result = 0;
while (*str == ' ' || *str == '\t')
{
++str;
}
if (*str == '-')
{
negative = true;
++str;
}
else if (*str == '+')
{
++str;
}
while (*str != '\0')
{
int n = *str - '0';
if (n < 0 || n > 9)
{
break;
}
if (negative)
{
result = result * 10 - n;
if (result < -2147483648LL)
{
result = -2147483648LL;
}
}
else
{
result = result * 10 + n;
if (result > 2147483647LL)
{
result = 2147483647LL;
}
}
++str;
}
return result;
}

//copyright@fuwutu 2013/5/29
int StrToInt(const char* str)
{
    bool negative = false;
    long long result = 0;
    while (*str == ' ' || *str == '\t')
    {
        ++str;
    }
    if (*str == '-')
    {
        negative = true;
        ++str;
    }
    else if (*str == '+')
    {
        ++str;
    }

    while (*str != '\0')
    {
        int n = *str - '0';
        if (n < 0 || n > 9)
        {
            break;
        }

        if (negative)
        {
            result = result * 10 - n;
            if (result < -2147483648LL)
            {
                result = -2147483648LL;
            }
        }
        else
        {
            result = result * 10 + n;
            if (result > 2147483647LL)
            {
                result = 2147483647LL;
            }
        }
        ++str;
    }

  return result;
}

run下程序，看看运行结果：

上图所示程序貌似通过了，然实际上它还是未能处理数据溢出的问题，因为它只是做了个取巧，即把返回的值esult定义成了long long，如下所示：

[cpp] view plain copy print ?

long long result = 0;

long long result = 0;

故严格说来，我们依然未写出准确的规范代码。

6 、那到底该如何解决这个数据溢出的问题呢？咱们先来看看Microsoft是如何实现atoi的吧：

[cpp] view plain copy print ?

//atol函数
//Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved.
long __cdecl atol(
const char *nptr
)
{
int c; /* current char */
long total; /* current total */
int sign; /* if ''-'', then negative, otherwise positive */
/* skip whitespace */
while ( isspace((int)(unsigned char)*nptr) )
++nptr;
c = (int)(unsigned char)*nptr++;
sign = c; /* save sign indication */
if (c == ''-'' || c == ''+'')
c = (int)(unsigned char)*nptr++; /* skip sign */
total = 0;
while (isdigit(c)) {
total = 10 * total + (c - ''0''); /* accumulate digit */
c = (int)(unsigned char)*nptr++; /* get next char */
}
if (sign == ''-'')
return -total;
else
return total; /* return result, negated if necessary */
}

//atol函数
//Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved.
long __cdecl atol(
	const char *nptr
	)
{
	int c; /* current char */
	long total; /* current total */
	int sign; /* if ''-'', then negative, otherwise positive */

	/* skip whitespace */
	while ( isspace((int)(unsigned char)*nptr) )
		++nptr;

	c = (int)(unsigned char)*nptr++;
	sign = c; /* save sign indication */
	if (c == ''-'' || c == ''+'')
		c = (int)(unsigned char)*nptr++; /* skip sign */

	total = 0;

	while (isdigit(c)) {
		total = 10 * total + (c - ''0''); /* accumulate digit */
		c = (int)(unsigned char)*nptr++; /* get next char */
	}

	if (sign == ''-'')
		return -total;
	else
		return total; /* return result, negated if necessary */
}

其中，isspace和isdigit函数的实现代码为：

[cpp] view plain copy print ?

isspace(int x)
{
if(x==' '||x=='/t'||x=='/n'||x=='/f'||x=='/b'||x=='/r')
return 1;
else
return 0;
}
isdigit(int x)
{
if(x<='9'&&x>='0')
return 1;
else
return 0;
}

isspace(int x)  
{  
	if(x==' '||x=='/t'||x=='/n'||x=='/f'||x=='/b'||x=='/r')  
		return 1;  
	else   
		return 0;  
}  

isdigit(int x)  
{  
	if(x<='9'&&x>='0')           
		return 1;   
	else   
		return 0;  
}

然后 atoi调用上面的atol函数，如下所示：

[cpp] view plain copy print ?

//atoi调用上述的atol
int __cdecl atoi(
const char *nptr
)
{
//Overflow is not detected. Because of this, we can just use
return (int)atol(nptr);
}

//atoi调用上述的atol
int __cdecl atoi(
	const char *nptr
	)
{
	//Overflow is not detected. Because of this, we can just use
	return (int)atol(nptr);
}

但很遗憾的是，上述atoi标准代码依然返回的是long：

[cpp] view plain copy print ?

long total; /* current total */
if (sign == ''-'')
return -total;
else
return total; /* return result, negated if necessary */

long total; /* current total */
if (sign == ''-'')
	return -total;
else
	return total; /* return result, negated if necessary */

再者，下面这里定义成long的total与10相乘，即total*10很容易溢出：

[cpp] view plain copy print ?

long total; /* current total */
total = 10 * total + (c - ''0''); /* accumulate digit */

long total; /* current total */
total = 10 * total + (c - ''0''); /* accumulate digit */

最后，根据本文评论下的读者meiyuli反应：“测试数据是字符串"-21474836480"，api算出来的是-2147483648，用上述代码算出来的结果是0”，如此，上述微软的这个atoi源码是有问题的。

7 、microsoft既然不行，读者想必很自然的想到linux。So，咱们接下来便看看 linux内核中是如何实现此字符串转换为整数的问题的。linux内核中提供了以下几个函数：

simple_strtol，把一个字符串转换为一个有符号长整数；
simple_strtoll，把一个字符串转换为一个有符号长长整数；
simple_strtoul，把一个字符串转换为一个无符号长整数；
simple_strtoull，把一个字符串转换为一个无符号长长整数

相关源码及分析如下。

首先，atoi调下面的strtol：

[cpp] view plain copy print ?

//linux/lib/vsprintf.c
//Copyright (C) 1991, 1992 Linus Torvalds
//simple_strtol - convert a string to a signed long
long simple_strtol(const char *cp, char **endp, unsigned int base)
{
if (*cp == '-')
return -simple_strtoul(cp + 1, endp, base);
return simple_strtoul(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtol);

//linux/lib/vsprintf.c
//Copyright (C) 1991, 1992  Linus Torvalds
//simple_strtol - convert a string to a signed long
long simple_strtol(const char *cp, char **endp, unsigned int base)
{
	if (*cp == '-')
		return -simple_strtoul(cp + 1, endp, base);

	return simple_strtoul(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtol);

然后，上面的strtol调下面的strtoul：

[cpp] view plain copy print ?

//simple_strtoul - convert a string to an unsigned long
unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
{
return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoul);

//simple_strtoul - convert a string to an unsigned long
unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
{
	return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoul);

接着，上面的strtoul调下面的strtoull：

[cpp] view plain copy print ?

//simple_strtoll - convert a string to a signed long long
long long simple_strtoll(const char *cp, char **endp, unsigned int base)
{
if (*cp == '-')
return -simple_strtoull(cp + 1, endp, base);
return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoll);

//simple_strtoll - convert a string to a signed long long
long long simple_strtoll(const char *cp, char **endp, unsigned int base)
{
	if (*cp == '-')
		return -simple_strtoull(cp + 1, endp, base);

	return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoll);

最后，strtoull调_parse_integer_fixup_radix和_parse_integer来处理相关逻辑：

[cpp] view plain copy print ?

//simple_strtoull - convert a string to an unsigned long long
unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
{
unsigned long long result;
unsigned int rv;
cp = _parse_integer_fixup_radix(cp, &base);
rv = _parse_integer(cp, base, &result);
/* FIXME */
cp += (rv & ~KSTRTOX_OVERFLOW);
if (endp)
*endp = (char *)cp;
return result;
}
EXPORT_SYMBOL(simple_strtoull);

//simple_strtoull - convert a string to an unsigned long long
unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
{
	unsigned long long result;
	unsigned int rv;

	cp = _parse_integer_fixup_radix(cp, &base);
	rv = _parse_integer(cp, base, &result);
	/* FIXME */
	cp += (rv & ~KSTRTOX_OVERFLOW);

	if (endp)
		*endp = (char *)cp;

	return result;
}
EXPORT_SYMBOL(simple_strtoull);

重头戏来了。接下来，我们来看上面strtoull函数中的parse_integer_fixup_radix和_parse_integer两段代码。如鲨鱼所说

“真正的处理逻辑主要是在_parse_integer里面，关于溢出的处理，_parse_integer处理的很优美，
而_parse_integer_fixup_radix是用来自动根据字符串判断进制的”。

先来看 _parse_integer函数：

[cpp] view plain copy print ?

//lib/kstrtox.c, line 39
//Convert non-negative integer string representation in explicitly given radix to an integer.
//Return number of characters consumed maybe or-ed with overflow bit.
//If overflow occurs, result integer (incorrect) is still returned.
unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
{
unsigned long long res;
unsigned int rv;
int overflow;
res = 0;
rv = 0;
overflow = 0;
while (*s) {
unsigned int val;
if ('0' <= *s && *s <= '9')
val = *s - '0';
else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f')
val = _tolower(*s) - 'a' + 10;
else
break;
if (val >= base)
break;
/*
* Check for overflow only if we are within range of
* it in the max base we support (16)
*/
if (unlikely(res & (~0ull << 60))) {
if (res > div_u64(ULLONG_MAX - val, base))
overflow = 1;
}
res = res * base + val;
rv++;
s++;
}
*p = res;
if (overflow)
rv |= KSTRTOX_OVERFLOW;
return rv;
}

//lib/kstrtox.c, line 39  
//Convert non-negative integer string representation in explicitly given radix to an integer.  
//Return number of characters consumed maybe or-ed with overflow bit.  
//If overflow occurs, result integer (incorrect) is still returned.  
unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)  
{  
    unsigned long long res;  
    unsigned int rv;  
    int overflow;  
  
    res = 0;  
    rv = 0;  
    overflow = 0;  
    while (*s) {  
        unsigned int val;  
  
        if ('0' <= *s && *s <= '9')  
            val = *s - '0';  
        else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f')  
            val = _tolower(*s) - 'a' + 10;  
        else  
            break;  
  
        if (val >= base)  
            break;  
        /* 
         * Check for overflow only if we are within range of 
         * it in the max base we support (16) 
         */  
        if (unlikely(res & (~0ull << 60))) {  
            if (res > div_u64(ULLONG_MAX - val, base))  
                overflow = 1;  
        }  
        res = res * base + val;  
        rv++;  
        s++;  
    }  
    *p = res;  
    if (overflow)  
        rv |= KSTRTOX_OVERFLOW;  
    return rv;  
}

解释下两个小细节：

上头出现了个unlikely，其实unlikely和likely经常出现在linux相关内核源码中
[cpp] view plain copy print ?
1. if(likely(value)){
2. //等价于if(likely(value)) == if(value)
3. }
4. else{
5. }
```
if(likely(value)){
	//等价于if(likely(value)) == if(value)
}
else{
}
```
likely表示value为真的可能性更大，而unlikely表示value为假的可能性更大，这两个宏被定义成：
[cpp] view plain copy print ?
1. //include/linux/compiler.h
2. # ifndef likely
3. # define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
4. # endif
5. # ifndef unlikely
6. # define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
7. # endif
```
//include/linux/compiler.h
# ifndef likely
#  define likely(x)	(__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
# endif
# ifndef unlikely
#  define unlikely(x)	(__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
# endif
```
呈现下div_u64的代码：
[cpp] view plain copy print ?
1. //include/linux/math64.h
2. //div_u64
3. static inline u64 div_u64(u64 dividend, u32 divisor)
4. {
5. u32 remainder;
6. return div_u64_rem(dividend, divisor, &remainder);
7. }
9. //div_u64_rem
10. static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
11. {
12. *remainder = dividend % divisor;
13. return dividend / divisor;
14. }
```
//include/linux/math64.h
//div_u64
static inline u64 div_u64(u64 dividend, u32 divisor)
{
	u32 remainder;
	return div_u64_rem(dividend, divisor, &remainder);
}

//div_u64_rem
static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
{
	*remainder = dividend % divisor;
	return dividend / divisor;
}
```

最后看下_parse_integer_fixup_radix函数：

[cpp] view plain copy print ?

//lib/kstrtox.c, line 23
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
{
if (*base == 0) {
if (s[0] == '0') {
if (_tolower(s[1]) == 'x' && isxdigit(s[2]))
*base = 16;
else
*base = 8;
} else
*base = 10;
}
if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')
s += 2;
return s;
}

//lib/kstrtox.c, line 23
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
{
	if (*base == 0) {
		if (s[0] == '0') {
			if (_tolower(s[1]) == 'x' && isxdigit(s[2]))
				*base = 16;
			else
				*base = 8;
		} else
			*base = 10;
	}
	if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')
		s += 2;
	return s;
}

读者MJN君在我的建议下，对上述linux内核中的atoi函数进行了测试，咱们来看下测试结果如何。

2147483647 : 2147483647
2147483648 : -2147483648
10522545459 : 1932610867
-2147483648 : -2147483648
-2147483649 : -2147483647
-10522545459 : 1932610867

如上，根据程序的输出结果可以看出，对于某些溢出的情况，atoi程序的处理并不符合本题的要求。

也就是说，atoi程序对溢出的处理是一个标准，而本题要求对溢出的处理则是另外一个标准，所以说直接用atoi程序达不到本题的要求，但你不能因为本题的标准而否认atoi程序的正确性。

既然直接借用atoi的源码（原理是parseXXX，int i=Integer.parseInt(String str)，把str转换成int的方法），不符合题目要求，则咱们另寻他路。

路漫漫其修远兮，吾等将上下而求索，但与此同时，我们已渐入佳境。

8、根据我们第1小节达成一致的字符串转换成整数的思路：“每扫描到一个字符，我们便把在之前得到的数字乘以10，然后再加上当前字符表示的数字”，相信读者已经觉察到，在扫描到最后一个字符的时候，如果之前得到的数比较大，此时若再让其扩大10倍，相对来说是比较容易溢出的。

但车到山前必有路，既然让一个比较大的int整型数括大10倍，比较容易溢出，那么在不好判断是否溢出的情况下，可以尝试使用除法。即如MJN所说：

与其将n扩大10倍,，冒着溢出的风险, 再与MAX_INT进行比较（如果已经溢出, 则比较的结果没有意义），
不如未雨绸缪先用n与MAX_INT/10进行比较：若n>MAX_INT/10（当然同时还要考虑n=MAX_INT/10的情况），说明最终得到的整数一定会溢出，故此时可以当即进行溢出处理，直接返回最大值MAX_INT，从而也就免去了计算n*10这一步骤。

也就是说，计算n*10前,先比较n与MAX_INT/10大小，若n>MAX_INT/10，那么n*10肯定大于MAX_INT，即代表最后得到的整数n肯定溢出，既然溢出，不能再计算n*10，直接提前返回MAX_INT就行了。

一直以来，我们努力的目的归根结底是为了更好的处理溢出，但上述做法最重要的是巧妙的规避了计算n*10这一乘法步骤，转换成计算除法MAX_INT/10代替，不能不说此法颇妙。

他的代码如下，如有问题请指出：

[cpp] view plain copy print ?

//copyright@njnu_mjn 2013
int StrToDecInt(const char* str)
{
static const int MAX = (int)((unsigned)~0 >> 1);
static const int MIN = -(int)((unsigned)~0 >> 1) - 1;
unsigned int n = 0;
int sign = 1;
int c;
while (isspace(*str))
++str;
if (*str == '+' || *str == '-')
{
if (*str == '-')
sign = -1;
++str;
}
while (isdigit(*str))
{
c = *str - '0';
if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
{
n = MAX;
break;
}
else if (sign < 0 && (n > (unsigned)MIN/10
|| (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
{
n = MIN;
break;
}
n = n * 10 + c;
++str;
}
return sign > 0 ? n : -n;
}

//copyright@njnu_mjn 2013
int StrToDecInt(const char* str)    
{    
    static const int MAX = (int)((unsigned)~0 >> 1);    
    static const int MIN = -(int)((unsigned)~0 >> 1) - 1;    
    unsigned int n = 0;    
    int sign = 1;    
    int c;    
    
    while (isspace(*str))    
        ++str;    
    if (*str == '+' || *str == '-')    
    {    
        if (*str == '-')    
            sign = -1;    
        ++str;    
    }    
    while (isdigit(*str))    
    {    
        c = *str - '0';    
        if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))    
        {    
            n = MAX;    
            break;    
        }    
        else if (sign < 0 && (n > (unsigned)MIN/10     
                              || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))    
        {    
            n = MIN;    
            break;    
        }    
        n = n * 10 + c;    
        ++str;    
    }    
    return sign > 0 ? n : -n;    
}

上述代码从测试结果来看，暂未发现什么问题

输入输出
10522545459 : 2147483647
-10522545459 : -2147483648

咱们再来总结下上述代码是如何处理溢出情况的。对于正数来说，它溢出的可能性有两种：

一种是诸如2147483650，即n > MAX/10 的；
一种是诸如2147483649，即n == MAX/10 && c > MAX%10。

故咱们上面处理溢出情况的代码便是：

[cpp] view plain copy print ?

c = *str - '0';
if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
{
n = MAX;
break;
}
else if (sign < 0 && (n > (unsigned)MIN/10
|| (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
{
n = MIN;
break;
}

c = *str - '0';  
        if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))  
        {  
            n = MAX;  
            break;  
        }  
        else if (sign < 0 && (n > (unsigned)MIN/10   
                              || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))  
        {  
            n = MIN;  
            break;  
        }

不过，即便如此，有些细节是改进的，如他自己所说：

n的声明及定义应该为
[cpp] view plain copy print ?
1. int n = 0;
```
int n = 0;  
```
将MAX/10,MAX%10,(unsigned)MIN/10及(unsigned)MIN%10保存到变量中, 防止重复计算

这样，优化后的代码为：

[cpp] view plain copy print ?

int StrToDecInt(const char* str)
{
static const int MAX = (int)((unsigned)~0 >> 1);
static const int MIN = -(int)((unsigned)~0 >> 1) - 1;
static const int MAX_DIV = (int)((unsigned)~0 >> 1) / 10;
static const int MIN_DIV = (int)((((unsigned)~0 >> 1) + 1) / 10);
static const int MAX_R = (int)((unsigned)~0 >> 1) % 10;
static const int MIN_R = (int)((((unsigned)~0 >> 1) + 1) % 10);
int n = 0;
int sign = 1;
int c;
while (isspace(*str))
++str;
if (*str == '+' || *str == '-')
{
if (*str == '-')
sign = -1;
++str;
}
while (isdigit(*str))
{
c = *str - '0';
if (sign > 0 && (n > MAX_DIV || (n == MAX_DIV && c >= MAX_R)))
{
n = MAX;
break;
}
else if (sign < 0 && (n > MIN_DIV
|| (n == MIN_DIV && c >= MIN_R)))
{
n = MIN;
break;
}
n = n * 10 + c;
++str;
}
return sign > 0 ? n : -n;
}

//copyright@njnu_mjn 2013
int StrToDecInt(const char* str)  
{  
    static const int MAX = (int)((unsigned)~0 >> 1);  
    static const int MIN = -(int)((unsigned)~0 >> 1) - 1;  
    static const int MAX_DIV = (int)((unsigned)~0 >> 1) / 10;  
    static const int MIN_DIV = (int)((((unsigned)~0 >> 1) + 1) / 10);  
    static const int MAX_R = (int)((unsigned)~0 >> 1) % 10;  
    static const int MIN_R = (int)((((unsigned)~0 >> 1) + 1) % 10);  
    int n = 0;  
    int sign = 1;  
    int c;  
  
    while (isspace(*str))  
        ++str;  
    if (*str == '+' || *str == '-')  
    {  
        if (*str == '-')  
            sign = -1;  
        ++str;  
    }  
    while (isdigit(*str))  
    {  
        c = *str - '0';  
        if (sign > 0 && (n > MAX_DIV || (n == MAX_DIV && c >= MAX_R)))  
        {  
            n = MAX;  
            break;  
        }  
        else if (sign < 0 && (n > MIN_DIV   
                                                    || (n == MIN_DIV && c >= MIN_R)))  
        {  
            n = MIN;  
            break;  
        }  
        n = n * 10 + c;  
        ++str;  
    }  
    return sign > 0 ? n : -n;  
}

部分数据的测试结果如下图所示：

输入            输出
10522545459  : 2147483647
-10522545459 : -2147483648
2147483648   : 2147483647
-2147483648  : -2147483648

是否已是完美？如MJN君本人所说“ 我的实现与linux内核的atoi函数的实现, 都有一个共同的问题: 即使出错, 函数也返回了一个值, 导致调用者误认为自己传入的参数是正确的, 但是可能会导致程序的其他部分产生莫名的错误且很难调试”。

9、最后看下Nut/OS中atoi的实现，同时，本小节内容主要来自参考文献条目9，即MJN的博客：

[cpp] view plain copy print ?

00077 #include <compiler.h>
00078 #include <stdlib.h>
00079
00084
00092 int atoi(CONST char *str)
00093 {
00094 return ((int) strtol(str, (char **) NULL, 10));
00095 }

00077 #include <compiler.h>
00078 #include <stdlib.h>
00079 
00084 
00092 int atoi(CONST char *str)
00093 {
00094     return ((int) strtol(str, (char **) NULL, 10));
00095 }

上述代码中strtol实现的思想跟上文第7节所述的MJN君的思路类似，也是除法代替乘法。加上测试函数后的具体代码如下：

[cpp] view plain copy print ?

#include <errno.h>
#include <stdio.h>
#include <ctype.h>
#include <limits.h>
#define CONST const
long mstrtol(CONST char *nptr, char **endptr, int base)
{
register CONST char *s;
register long acc, cutoff;
register int c;
register int neg, any, cutlim;
/*
* Skip white space and pick up leading +/- sign if any.
* If base is 0, allow 0x for hex and 0 for octal, else
* assume decimal; if base is already 16, allow 0x.
*/
s = nptr;
do {
c = (unsigned char) *s++;
} while (isspace(c));
if (c == '-') {
neg = 1;
c = *s++;
} else {
neg = 0;
if (c == '+')
c = *s++;
}
if ((base == 0 || base == 16) && c == '0' && (*s == 'x' || *s == 'X')) {
c = s[1];
s += 2;
base = 16;
}
if (base == 0)
base = c == '0' ? 8 : 10;
/*
* Compute the cutoff value between legal numbers and illegal
* numbers. That is the largest legal value, divided by the
* base. An input number that is greater than this value, if
* followed by a legal input character, is too big. One that
* is equal to this value may be valid or not; the limit
* between valid and invalid numbers is then based on the last
* digit. For instance, if the range for longs is
* [-2147483648..2147483647] and the input base is 10,
* cutoff will be set to 214748364 and cutlim to either
* 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
* a value > 214748364, or equal but the next digit is > 7 (or 8),
* the number is too big, and we will return a range error.
*
* Set any if any `digits' consumed; make it negative to indicate
* overflow.
*/
cutoff = neg ? LONG_MIN : LONG_MAX;
cutlim = cutoff % base;
cutoff /= base;
if (neg) {
if (cutlim > 0) {
cutlim -= base;
cutoff += 1;
}
cutlim = -cutlim;
}
for (acc = 0, any = 0;; c = (unsigned char) *s++) {
if (isdigit(c))
c -= '0';
else if (isalpha(c))
c -= isupper(c) ? 'A' - 10 : 'a' - 10;
else
break;
if (c >= base)
break;
if (any < 0)
continue;
if (neg) {
if ((acc < cutoff || acc == cutoff) && c > cutlim) {
any = -1;
acc = LONG_MIN;
errno = ERANGE;
} else {
any = 1;
acc *= base;
acc -= c;
}
} else {
if ((acc > cutoff || acc == cutoff) && c > cutlim) {
any = -1;
acc = LONG_MAX;
errno = ERANGE;
} else {
any = 1;
acc *= base;
acc += c;
}
}
}
if (endptr != 0)
*endptr = (char *) (any ? s - 1 : nptr);
return (acc);
}
int matoi2(CONST char *str)
{
return ((int) mstrtol(str, (char **) NULL, 10));
}
int mgetline(char* buf, size_t n) {
size_t idx = 0;
int c;
while (--n > 0 && (c = getchar()) != EOF && c != '\n') {
buf[idx++] = c;
}
buf[idx] = '\0';
return idx;
}
#define MAX_LINE 200
int main() {
char buf[MAX_LINE];
while (mgetline(buf, MAX_LINE) >= 0) {
if (strcmp(buf, "quit") == 0) break;
printf("matoi2=%d\n", matoi2(buf));
}
return 0;
}

#include <errno.h>
#include <stdio.h>
#include <ctype.h>
#include <limits.h>

#define CONST      const

long mstrtol(CONST char *nptr, char **endptr, int base)
{
    register CONST char *s;
    register long acc, cutoff;
    register int c;
    register int neg, any, cutlim;

    /*
     * Skip white space and pick up leading +/- sign if any.
     * If base is 0, allow 0x for hex and 0 for octal, else
     * assume decimal; if base is already 16, allow 0x.
     */
    s = nptr;
    do {
        c = (unsigned char) *s++;
    } while (isspace(c));
    if (c == '-') {
        neg = 1;
        c = *s++;
    } else {
        neg = 0;
        if (c == '+')
            c = *s++;
    }
    if ((base == 0 || base == 16) && c == '0' && (*s == 'x' || *s == 'X')) {
        c = s[1];
        s += 2;
        base = 16;
    }
    if (base == 0)
        base = c == '0' ? 8 : 10;

    /*
     * Compute the cutoff value between legal numbers and illegal
     * numbers.  That is the largest legal value, divided by the
     * base.  An input number that is greater than this value, if
     * followed by a legal input character, is too big.  One that
     * is equal to this value may be valid or not; the limit
     * between valid and invalid numbers is then based on the last
     * digit.  For instance, if the range for longs is
     * [-2147483648..2147483647] and the input base is 10,
     * cutoff will be set to 214748364 and cutlim to either
     * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
     * a value > 214748364, or equal but the next digit is > 7 (or 8),
     * the number is too big, and we will return a range error.
     *
     * Set any if any `digits' consumed; make it negative to indicate
     * overflow.
     */
    cutoff = neg ? LONG_MIN : LONG_MAX;
    cutlim = cutoff % base;
    cutoff /= base;
    if (neg) {
        if (cutlim > 0) {
            cutlim -= base;
            cutoff += 1;
        }
        cutlim = -cutlim;
    }
    for (acc = 0, any = 0;; c = (unsigned char) *s++) {
        if (isdigit(c))
            c -= '0';
        else if (isalpha(c))
            c -= isupper(c) ? 'A' - 10 : 'a' - 10;
        else
            break;
        if (c >= base)
            break;
        if (any < 0)
            continue;
        if (neg) {
            if ((acc < cutoff || acc == cutoff) && c > cutlim) {
                any = -1;
                acc = LONG_MIN;
                errno = ERANGE;
            } else {
                any = 1;
                acc *= base;
                acc -= c;
            }
        } else {
            if ((acc > cutoff || acc == cutoff) && c > cutlim) {
                any = -1;
                acc = LONG_MAX;
                errno = ERANGE;
            } else {
                any = 1;
                acc *= base;
                acc += c;
            }
        }
    }
    if (endptr != 0)
        *endptr = (char *) (any ? s - 1 : nptr);
    return (acc);
}

int matoi2(CONST char *str)
{
    return ((int) mstrtol(str, (char **) NULL, 10));
}

int mgetline(char* buf, size_t n) {
  size_t idx = 0;
  int c;
  
  while (--n > 0 && (c = getchar()) != EOF && c != '\n') {
    buf[idx++] = c;
  }
  buf[idx] = '\0';
  return idx;
}

#define MAX_LINE 200

int main() {
    char buf[MAX_LINE];
    while (mgetline(buf, MAX_LINE) >= 0) {
        if (strcmp(buf, "quit") == 0) break;
        printf("matoi2=%d\n", matoi2(buf));
    }
    return 0;
}

同样，MJN对上述实现测试了下，结果如下:

10522545459
matoi2=2147483647
-10522545459
matoi2=-2147483648

程序貌似对溢出的处理是正确的, 真的吗? 再把测试数据换成"10522545454"（与"10522545459"的区别在于最后一个字符）

10522545454
matoi2=1932610862
-10522545454
matoi2=-1932610862

症结就在于下面这段代码：

[cpp] view plain copy print ?

if (neg) {
if ((acc < cutoff || acc == cutoff) && c > cutlim) {
any = -1;
acc = LONG_MIN;
errno = ERANGE;
} else {
any = 1;
acc *= base;
acc -= c;
}
} else {
if ((acc > cutoff || acc == cutoff) && c > cutlim) {
any = -1;
acc = LONG_MAX;
errno = ERANGE;

if (neg) {
            if ((acc < cutoff || acc == cutoff) && c > cutlim) {
                any = -1;
                acc = LONG_MIN;
                errno = ERANGE;
            } else {
                any = 1;
                acc *= base;
                acc -= c;
            }
        } else {
            if ((acc > cutoff || acc == cutoff) && c > cutlim) {
                any = -1;
                acc = LONG_MAX;
                errno = ERANGE;

要想得到正确的输出结果，需要改动两个地方：

①其中这行：

[cpp] view plain copy print ?

if ((acc > cutoff || acc == cutoff) && c > cutlim)

if ((acc > cutoff || acc == cutoff) && c > cutlim)

应该改为：

[cpp] view plain copy print ?

if ( acc > cutoff || (acc == cutoff) && c > cutlim) )

if ( acc > cutoff ||  (acc == cutoff) && c > cutlim)  )

②与此同时，这行:

[cpp] view plain copy print ?

if ((acc < cutoff || acc == cutoff) && c > cutlim) {

if ((acc < cutoff || acc == cutoff) && c > cutlim) {

改为：

[cpp] view plain copy print ?

if (acc < cutoff || (acc == cutoff && c > cutlim)) {

if (acc < cutoff || (acc == cutoff && c > cutlim)) {

为何要这样修改呢？细心的读者相信还是会记得上文第8节中关于正数的两种溢出情况的可能性：“对于正数来说，它溢出的可能性有两种：

一种是诸如2147483650，即n > MAX/10 的；
一种是诸如2147483649，即n == MAX/10 && c > MAX%10。”

也就是说无论是"10522545459"，还是"10522545454"，都是属于第1种情况，即“诸如2147483650，即n > MAX/10的”，此时直接返回MAX_INT即可，所以不需要也不能再去判断n == MAX/10的情况。

这个处理思路类似于上文第8节处理溢出情况的代码：

[cpp] view plain copy print ?

if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))
{
n = MAX;
break;
}
else if (sign < 0 && (n > (unsigned)MIN/10
|| (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))
{
n = MIN;
break;
}

if (sign > 0 && (n > MAX/10 || (n == MAX/10 && c > MAX%10)))    
        {    
            n = MAX;    
            break;    
        }    
        else if (sign < 0 && (n > (unsigned)MIN/10     
                              || (n == (unsigned)MIN/10 && c > (unsigned)MIN%10)))    
        {    
            n = MIN;    
            break;    
        }

So，修改过后的代码测试正常:

10522545459
matoi2=2147483647
-10522545459\
matoi2=-2147483648
10522545454
matoi2=2147483647
-10522545454
matoi2=-2147483648
quit

OK，字符串转换成整数这一问题已基本解决。但如果面试官继续问你，如何把整数转换成字符串呢？欢迎于本文评论下或hero上show出你的思路或代码。

和几米的约定

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第一章字符串转换成整数

第三十章、字符串转换成整数先看题目：输入一个表示整数的字符串，把该字符串转换成整数并输出，例如输入字符串"345"，则输出整数345。给定函数原型int StrToInt(const char *str)，完成函数StrToInt，实现字符串转换成整数的功能，不得用库函数atoi（即便准许使用，其对于溢出情况的处理也达不到题目的要求，详情请参看下文第7节末）。
复制链接

扫一扫