Homework(Simulation):
This program, x86.py, allows you to see how different thread interleavings either cause or avoid race conditions. See the README for details on how the program works and answer the questions below
Questions:
1. Examine flag.s. This code “implements” locking with a single memory flag. Can you understand the assembly?
flag.s是对图28.1程序的汇编实现
python3 x86.py -p flag.s -c
2. When you run with the defaults, does flag.s work? Use the -M and -R flags to trace variables and registers (and turn on -c to see their values). Can you predict what value will end up in flag?
count最终为2,程序按预期工作
python3 x86.py -p flag.s -M flag,count -R ax,bx -c
3. Change the value of the register %bx with the -a flag (e.g., -a bx=2,bx=2 if you are running just two threads). What does the code do? How does it change your answer for the question above?
count最终为4,代码先执行Thread0,待Thread0结束后,执行Thread1
python3 x86.py -p flag.s -a bx=2,bx=2 -M flag,count -R ax,bx -c
4. Set bx to a high value for each thread, and then use the -i flag to generate different interrupt frequencies; what values lead to a bad outcomes? Which lead to good outcomes?
只有当中断频率为11的倍数时,程序才能按预期工作,因为有11条指令循环(1000-1010)
第12条结束指令(1011 halt)只执行一次,因此不考虑在内
当中断频率不是11的倍数时,都有可能发生图28.2的问题,即中断发生在指令1001和指令1003之间,导致锁的失效
python3 x86.py -p flag.s -a bx=50,bx=50 -i 11 -M flag,count -R ax,bx -c
5. Now let’s look at the program test-and-set.s. First, try to understand the code, which uses the xchg instruction to build a simple locking primitive. How is the lock acquire written? How about lock release?
获取锁:
mov $1, %ax
xchg %ax, mutex # atomic swap of 1 and mutex
test $0, %ax # if we get 0 back: lock is free!
jne .acquire # if not, try again
释放锁:
mov $0, mutex
6. Now run the code, changing the value of the interrupt interval (-i) again, and making sure to loop for a number of times. Does the code always work as expected? Does it sometimes lead to an inefficient use of the CPU? How could you quantify that?
程序会按预期执行,但是会导致CPU使用率不高,因为会有自旋发生
可以用自旋指令数/总指令数来衡量
令bx=5
当i=11,自旋指令数/总指令数=0,说明没有自旋发生
当i=4,自旋指令数/总指令数=44/156=28.2%,说明有28.2%的指令是在自旋,浪费CPU资源
python3 x86.py -p test-and-set.s -a bx=5,bx=5 -i 4 -M mutex,count -R ax,bx -c
7. Use the -P flag to generate specific tests of the locking code. For example, run a schedule that grabs the lock in the first thread, but then tries to acquire it in the second. Does the right thing happen? What else should you test?
程序可以保证正确执行
python3 x86.py -p test-and-set.s -a bx=5,bx=5 -i 4 -M mutex,count -R ax,bx -P 000111 -c
8. Now let’s look at the code in peterson.s, which implements Peterson’s algorithm (mentioned in a sidebar in the text). Study the code and see if you can make sense of it.
Peterson's algorithm的汇编实现
这3条指令可以防止出现图28.2的情况
mov turn, %ax
test %cx, %ax # compare 'turn' and '1 - self'
je .spin1 # if turn==1-self, go back and start spin again
9. Now run the code with different values of -i. What kinds of different behavior do you see? Make sure to set the thread IDs appropriately (using -a bx=0,bx=1 for example) as the code assumes it.
python3 x86.py -p peterson.s -a bx=0,bx=1 -i 50 -M flag,turn,count -R ax,bx,cx -c
10. Can you control the scheduling (with the -P flag) to “prove” that the code works? What are the different cases you should show hold? Think about mutual exclusion and deadlock avoidance.
先运行6条Thread0的指令,再运行6条Thread1的指令,这时turn = 0,而上面提到的3条指令起了效果,将运行Thread0而Thread1自旋等待
python3 x86.py -p peterson.s -a bx=0,bx=1 -M flag,turn,count -R ax,bx,cx -P 000000111111
11. Now study the code for the ticket lock in ticket.s. Does it match the code in the chapter? Then run with the following flags: -a bx=1000,bx=1000 (causing each thread to loop through the critical section 1000 times). Watch what happens; do the threads spend much time spin-waiting for the lock?
是图28.7的汇编实现
python3 x86.py -p ticket.s -a bx=1000,bx=1000 -i 50 -M ticket,turn,count -R ax,bx,cx -c
12. How does the code behave as you add more threads?
Fetch-And-Add在Test-And-Set的基础上,保证了公平性,可以让每一个线程都有相等的机会运行,不会饿死
但仍然会有自旋等待发生
python3 x86.py -p ticket.s -t 4 -a bx=5 -i 2 -M ticket,turn,count -R ax,bx,cx -c
13. Now examine yield.s, in which a yield instruction enables one thread to yield control of the CPU (realistically, this would be an OS primitive, but for the simplicity, we assume an instruction does the task). Find a scenario where test-and-set.s wastes cycles spinning, but yield.s does not.
在test-and-set.s的基础上改变跳转条件并增加了3条指令
mov $1, %ax
xchg %ax, mutex # atomic swap of 1 and mutex
test $0, %ax # if we get 0 back: lock is free!
je .acquire_done
yield # if not, yield and try again
j .acquire
.acquire_done
python3 x86.py -p yield.s -i 5 -a bx=5,bx=5 -M mutex,count -R ax,bx -c
14. Finally, examine test-and-test-and-set.s. What does this lock do? What kind of savings does it introduce as compared to test-and-set.s?
相比test-and-set.s多出来了这2条指令
mov mutex, %ax
test $0, %ax
jne .acquire
而这3条指令正是flag.s中用于判断是否有锁的指令