《CS：APP》 chapter 2 Representing and Manipulating Information 笔记

最新推荐文章于 2023-03-10 08:56:21 发布

JasonLeaster

最新推荐文章于 2023-03-10 08:56:21 发布

阅读量3k

点赞数

分类专栏： CSAPP Days_with_CS:APP 文章标签：计算机二进制数据

本文链接：https://blog.csdn.net/cinmyheart/article/details/27526631

版权

CSAPP 同时被 2 个专栏收录

20 篇文章 2 订阅

订阅专栏

Days_with_CS:APP

17 篇文章 5 订阅

订阅专栏

Representing and Manipulating Information

首先，普及一下历史知识，原来十进制数都使用1000多年了。。。其实我真不知道。。。斐波拉契(Fibonacci)很屌的说(废话。。。。)

The familiar decimal, or base-10, representation has been in use for over 1000 years, having been developed in India, improved by Arab mathematicians in the 12th century, and brought to the West in the 13th century by the Italian mathematician Leonardo Pisano (c. 1170 – c. 1250), better known as Fibonacci.

为什么计算机选择二进制进行基础运算，而不是十进制？

Binary values work better when building machines that store and process information. Two-valued signals can
readily be represented, stored, and transmitted

数据Overflow的根本原因

Computer representations use a limited number of bits to encode a number, and hence some operations can overflow when the results are too large to be rep-resented

2.1 Information Storage

Rather than accessing individual bits in memory, most computers use blocks of eight bits, or bytes, as the smallest addressable unit of memory. Every byte of memory is identified by a unique number, known as its
address.

2.1.1 Hexadecimal Notation
A single byte consists of 8 bits. In binary notation, its value ranges from 00000000 to 11111111. When viewed as a decimal integer, its value ranges from 0 to 255.Neither notation is very convenient for describing bit patterns. Binary notation is too verbose, while with decimal notation, it is tedious to convert to and from bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers. Hexadecimal (or simply “hex”) uses digits ‘0’ through ‘9’ along with characters ‘A’ through ‘F’ to represent 16 possible values.

关于10进制，16进制，8进制，2进制之间的相互转换不在此赘述，已经讲烂了。。。

2.1.2 Words
Every computer has aword size , indicating the nominal size of integer and pointer data.

As hardware costs drop over time, even desktop and laptop machines will switch to 64-bit word sizes, and so we will consider the general case of a w -bit word size, as well as the special cases of w = 32 and w = 64.

2.1.3 Data Sizes

　　　 A　“long” integer uses the full word size of the machine. The “long long” integer　data type, introduced in ISO C99, allows the full range of 64-bit integers. For 32-bit　machines, the compiler must compile operations for this data type by generating　code that performs sequences of 32-bit operations.

2.1.4 Addressing and Byte Ordering

　　　　　The former convention—where the least signifi-cant byte comes first—is referred to as little endian. This convention is followed　by most Intel-compatible machines. The latter convention—where the most sig-nificant byte comes first—is referred to as big endian. This convention is followed　by most machines from IBM and Sun Microsystems. Note that we said “most.”　The conventions do not split precisely along corporate boundaries.

Byte ordering becomes an issue. The first is when binary data are communicated over a network between different machines. A common problem is for data produced by a little-endian machine to be sent to a big-endian machine, or vice versa, leading to the bytes within the words being in reverse order for the receiving program.

A second case where byte ordering becomes important is when looking at the byte sequences representing integer data. This occurs often when inspecting machine-level programs. As an example, the following line occurs in a file that gives a text representation of the machine-level code for an Intel IA32 processor:

80483bd: 01 05 64 94 04 08 add %eax,0x8049464

This line was generated by adisassembler, a tool that determines the instruction sequence represented by an executable program file. We will learn more about disassemblers and how to interpret lines such as this in Chapter 3. For now, we simply note that this line states that the hexadecimal byte sequence 01 05 64 94 04 08 is the byte-level representation of an instruction that adds a word ofdata to the value stored at address 0x8049464 . If we take the final 4 bytes of the sequence, 64940408, and write them in reverse order, we have 08049464. Dropping the leading 0, we have the value 0x8049464 , the numeric value written on the right.

2.1.5 Representing Strings

A string in C is encoded by an array of characters terminated by the null (having value 0) character. Each character is represented by some standard encoding, with the most common being the ASCII character code. Thus, if we run our routine show_byteswith arguments "12345" and 6 (to include the terminating character), we get the result 313233343500 .

对于数据的位运算和逻辑运算不在此赘述

2.2 Integer Representations

2.2.1 Integral Data Types

回答为什么二进制数串能表示十进制数

The unsigned binary representation has the important property that every number between 0 and 2^w− 1 has a unique encoding as a w -bit value. For example, there is only one representation of decimal value 11 as an unsigned, 4-bit number, namely [1011]. This property is captured in mathematical terms by stating that function B2Uw is a bijection —it associates a unique value to each bit vector of length w ; conversely, each integer between 0 and 2^w− 1 has a unique binary representation as a bit vector of length w.

2.2.3 Two’s-Complement Encodings

我前几天才知道，所谓的“补码”是two's-complement。。。。只有中国大陆说补码，香港和台湾说的是二补数

The most com-mon computer representation of signed numbers is known as two’s-complement form. This is defined by interpreting the most significant bit of the word to have negative weight. We express this interpretation as a function B2Tw(for “binary to two’s-complement” length w ):

We see that the bit patterns are identical for Figures 2.11 and 2.12 (as well as for Equations 2.2 and 2.4), but the values differ when the most significant bit is 1, since in one case it has weight +8, and in the other case it has weight −8.)

The C standards do not require signed integers to be represented in two’s-complement form, but nearly all machines do so.

C 标准并没有要求用two's complement 来储存负数，但是大多数极其就是这么干的

对于one's complement and sign-magnitude,分别是大陆说的反码和有符号数

对于理解取相反数以及无符号数，有符号数之间的强制类型转换,极其重要的一张图：

2.2.4 Conversions Between Signed and Unsigned

For example, consider the following code:

1 short int v = -12345;
2 unsigned short uv = (unsigned short) v;
3 printf("v = %d, uv = %u\n", v, uv);

When run on a two’s-complement machine, it generates the following output:

v = -12345, uv = 53191

What we see here is that the effect of casting is to keep the bit values identical but change how these bits are interpreted.We saw in Figure 2.14 that the 16-bit two’s-complement representation of −12,345 is identical to the 16-bit unsigned representation of 53,191.

Casting from shortint to unsignedshortchanged the numeric value, but not the bit representation. In casting from unsignedintto int , the underlying bit representa-tion stays the same.

2.2.5 Signed vs. Unsigned in C

Although the C standard does not specify a particu-lar representation of signed numbers, almost all machines use two’s complement. Generally, most numbers are signed by default.

C allows conversion between unsigned and signed. The rule is that the under-lying bit representation is not changed.

C允许强制类型转换

对于赋值符号，右边的操作数被强制类型转换成左边的数据类型

而对于比较操作符，操作数中任意一个是无符号数，则将两个操作数都强制类型转换成为无符号的

2.2.6 Expanding the Bit Representation of a Number

数据的位扩展，到这里我就真的不知道怎么组织语言了....T-T

数据类型位数多的向数据类型位数少的扩展，如果是负数，高位补1，如果是正数高位补0。。。就这样吧。。。。

2.2.7 Truncating Numbers

数据的截断

说白了，就是转向位数小的数据类型时，数据进行取模

切记 malloc的参数是size_t ，而size_t 是unsigned int！！！忘记这个会引发血案的！哈哈

2.3 Integer Arithmetic

2.3.1 Unsigned Addition

无符号的加法

一般的加法都很简单，一个坑就是溢出，主要谈的就是这个坑。。。

值得一提的是无符号数只有向上溢出(upward overflow)，没有向下溢出，向下溢出只可能出现在有符号运算中

2.3.2 Two’s-Complement Addition

In fact, most computers use the same machine instruction to perform eitherunsigned or signed addition.

四种加法情况

给出了向上溢出和向下溢出的情况

两个很有意思的练习题，有必要贴一下

Practice Problem 2.30

Write a function with the following prototype:

/* Determine whether arguments can be added without overflow */

int tadd_ok(int x, int y);

This function should return 1 if argumentsx and y can be added withoutcausing overflow.

Solution to Problem 2.30

This function is a direct implementation of the rules given to determine whether
or not a two’s-complement addition overflows.
/* Determine whether arguments can be added without overflow */

int tadd_ok(int x, int y) {
int sum = x+y;
int neg_ove r=x< 0&&y< 0&&sum>=0;
int pos_ove r=x>=0&&y>=0&&sum< 0;
return !neg_over && !pos_over;
}

Practice Problem 2.31

Your coworker gets impatient with your analysis of the overflow conditionsfor two’s-complement addition and presents you with the followingimplementation of tadd_ok :

/* Determine whether arguments can be added without overflow */

/* WARNING: This code is buggy. */

int tadd_ok(int x, int y) {

int sum = x+y;

return (sum-x == y) && (sum-y == x);

}

You look at the code and laugh. Explain why.

Solution to Problem 2.31

Your coworker could have learned, by studying Section 2.3.2, that two’s-complement addition forms an abelian group, and so the expression (x+y)-x will evaluate to y regardless of whether or not the addition overflows, and that
(x+y)-y will always evaluate to x.

2.3.3 Two’s-Complement Negation

取相反数基本上就可以概括为一句话——取反加1

In C, we can state that for any integer valuex, computing the expressions -xand~x+1 will give identical results.

2.3.4 Unsigned Multiplication

乘法不是很单纯的数学上的乘法，由于溢出的问题，所有的乘法都会有溢出的可能，有溢出就有取模

2.3.5 Two’s-Complement Multiplication

非常有名的一个vulnerability如下:

int tmult_ok(int x,int y)
{

    Long long pll = (long long)x*y;//分别扩展成为64位的数据再做乘法运算

    Return pll == (int)pll;

}

如果把Long long pll = (long long) x*y;

改写成为

Long long pll = x*y;

结果将是32位的数据x*y，然后扩展成64位的数据

2.3.6 Multiplying by Constants

On most machines, the integer multiply instruction is fairly slow,requiring 10 or more clock cycles, whereas other integer operations—such asaddition, subtraction, bit-level operations, and shifting—require only 1 clockcycle.

2.3.7 Dividing by Powers of Two

Integer division on most machines is even slower than integer multiplication—requiring30 or more clock cycles. The two different shifts—logical and arithmetic—serve this purpose forunsigned and two’s-complement numbers, respectively.

2.4 Floating Point

让人感觉很高深的float point，开始！

2.4.1 Fractional Binary Numbers

2.4.2 IEEE Floating-Point Representation

The sign s determines whether the number is negative (s = 1) or positive (s = 0), where the interpretation of the sign bit for numeric value 0 is handled as a special case.

The significand M is a fractional binary number that ranges either between 1 and 2 −theta or between 0 and 1− theta

The exponent E weights the value by a (possibly negative) power of 2.

The bit representation of a floating-point number is divided into three fields to
encode these values:

The single sign bit s directly encodes the sign s .

The k -bit exponent field exp = ek −1...e1e0 encodes the exponent E .

The n-bit fraction field frac= fn−1...f1f0 encodes the significand M , but the value encoded also depends on whether or not the exponent field equals 0.

In the single-precision floating-point format (a float in C), fieldss, exp , and frac are 1, k = 8, and n = 23 bits each, yielding a 32-bit representation. In the double-precision floating-point format (a double in C), fields s, exp , and fracare 1, k = 11, andn = 52 bits each, yielding a 64-bit representation.

Case 1: Normalized Values
This is the most common case. It occurs when the bit pattern of exp is neither all zeros (numeric value 0) nor all ones (numeric value 255 for single precision, 2047 for double). In this case, the exponent field is interpreted as representing a signed integer inbiased form. That is, the exponent value is E = e − Bias where e
is the unsigned number having bit representation ek −1...e1e0, and Bias is a bias value equal to 2^(k −1) − 1 (127 for single precision and 1023 for double). This yields exponent ranges from −126 to +127 for single precision and −1022 to+1023 for double precision. The fraction field fracis interpreted as representing the fractional value f ,
where 0 ≤ f< 1, having binary representation 0 .fn−1...f1f0, that is, with the binary point to the left of the most significant bit. The significand is defined to be M = 1 + f . This is sometimes called an implied leading 1 representation, because we can viewM to be the number with binary representation 1 .fn−1fn−2...f0. Thisrepresentation is a trick for getting an additional bit of precision for free, since we can always adjust the exponent E so that significand M is in the range 1 ≤ M< 2 (assuming there is no overflow). We therefore do not need to explicitly represent the leading bit, since it always equals 1

Case 2: Denormalized Values
When the exponent field is all zeros, the represented number is indenormalized form. In this case, the exponent value is E = 1 − Bias , and the significand value is M = f , that is, the value of the fraction field without an implied leading 1.

Denormalized numbers serve two purposes. First, they provide a way to represent numeric value 0, since with a normalized number we must always have M ≥ 1, and hence we cannot represent 0. In fact the floating-point representation of +0 .0 has a bit pattern of all zeros: the sign bit is 0, the exponent field is all zeros (indicating a denormalized value), and the fraction field is all zeros, giving M = f = 0. Curiously, when the sign bit is 1, but the other fields are all zeros, we get the value −0 .0. With IEEE floating-point format, the values −0 .0 and +0 .0 are considered different in some ways and the same in others. A second function of denormalized numbers is to represent numbers that are very close to 0.0. They provide a property known as gradual underflow in which possible numeric values are spaced evenly near 0.0

Case 3: Special Values
A final category of values occurs when the exponent field is all ones. When the fraction field is all zeros, the resulting values represent infinity, either +∞ when s = 0, or −∞ whens = 1. Infinity can represent results that overflow , as when we multiply two very large numbers, or when we divide by zero. When the fraction field is nonzero, the resulting value is called a “NaN,” short for “Not a Number.” Such values are returned as the result of an operation where the result cannot be given as a real number or as infinity, as when computing √−1or ∞−∞. They can also be useful in some applications for representing uninitialized data

对于理解极其重要的一个demo

e : The value represented by considering the exponent field to be an unsigned integer
E : The value of the exponent after biasing
2^E: The numeric weight of the exponent
f : The value of the fraction

Exponent ！= 0 M = f+1

else

M = f;

fraction bits 对应f

exponent bits 对应e

到这里差不多啦。。。。

后面的rounding和浮点运算看书吧。。。。浮点运算感觉略水

Jean-Baptiste Édouard Detaille (5 October 1848 – 23 December 1912), was a French academic painter and military artist noted for his precision and realistic detail. He was regarded as the "semi-official artist of the French army"

《爱德华循序渐进的梦想》