只有题目的简略翻译 我还没做 有的句子不会翻译
Exercise 4
Profiling Lab: Understanding Program Performance
这个练习呢,将给一个程序让你去优化,虽然会有很多可以优化的地方,不过你应该注意那些能更显著地减少运行时间的优化。
这个程序呢,可以perform string substitutions on a list of files。这个程序的输入是specified on the command line的。举个例子:
substitute.exe replacements.txt file1.txt file2.txt ... fileN.txt
例子中这个叫做replacements.txt的文件,包含a list of substitutions to perform。每个Each string substitution is specified in 3 lines:第一行是搜索的关键词(第一个字符串),下一行是替代词(第二个字符串),然后空一行。(感觉就是office里的替代)如下所示:
the
that
his
Her
command line上剩下的文件是要被修改的文件。这个程序读取每个文件,每一次performs the substitutions one line,然后写文件。为了实现替代,这个程序在文件里寻找第一个字符串的精确匹配。匹配的字符将被第二个字符串所替代。然后将继续在这个文件里搜索第一个字符串的匹配。This match is performed on the new state of the file,所以它可能包含之前替代过的字符。实际上,如果替代词包含搜索词,那么这个程序将进入。如果不再找得到匹配,这个程序将进行到替代文件的下一行。
一个好的编程风格将避免一次读进整个文件,因为文件可能会很大。这个练习里呢,你则可以假定总有足够的内存去读取整个文件。
要运行 substitute.Exe的话,把它移到任意一个目录下,在命令行里输入:
substitute.exe replace.txt call.cpp compiler.cpp driver.cpp getopt.cpp jnk.cpp mach.cpp math.cpp semantics.cpp test.cpp
Profiling substitute.exe
Once everything is running (check the test files to see that the substitutions were applied), you are ready to start optimizing. The first step will be to use a profiler to find out where the program is spending its time and what it is doing with that time. Consult Appendix A Profiler Customized for SSD6 for more information about the profiler.
当一切都在运行的时候就可以试着去优化了。第一步呢,先用一个profiler(分析器)去找出这个程序在什么地方很花时间,这段时间程序在做什么。关于分析器呢,看看 Appendix A Profiler Customized for SSD6 吧。
You should make a new version of substitute.exe and demonstrate, using profiling output, that it runs faster. You should be able to obtain at least a factor of 2 speedup (old run time divided by new run time). You do not have to use Microsoft Foundation Class objects, but given that these are well written and probably correct, you should only replace code that is doing unnecessary work as reflected in profiler measurements.
你应该发布一个新的substitute.exe,并且用分析器的输出结果证实你发布的东东的确跑的要快一些。你应该能够获得至少一个 factor of 2 speedup (old run time divided by new run time)。你不一定要用 Microsoft Foundation Class objects,不过既然它们都是well written而且基本上都是正确的,你应该只需要修改那些做无用功的代码,as reflected in profiler measurements.
提交两个文件:
1 你修改过的 substitute.cpp
2 一个文件,包含下面这些东西:
1 一个清楚简明的优化前你所观察到的现象的描述。这个应该是由一个分析器的输出来证实一下子。
2 你注意到的瓶颈。
3 为了address这些瓶颈,你做了什么,然后你观察到了怎样的提升(还是要有empirical evidence)。
4 如果你决定继续下一块最能优化的代码,那么指出要优化的是什么。并且说明你为什么没曾试着去优化这一块。
为了保证你的正确性,比较一下源代码改变前后的输出文件,因为你的优化并不能改变程序的功能,因此对应的输出文件应该是一致的。你可以用comp这个命令来检查下这两个文件是否一致。
另外,Unix工具比如 Find和SED还有Awk、Perl这些语言使这种替代很简单。
=======================
Profiler Customized for SSD6
The Visual C++.Net software bundle does not include a code profiler. However, the bundle includes a set of API (along with some examples) that tool developers can use to build a profiler. iCarnegie has customized the profiler code provided by Microsoft to suit the purposes of this course.
VC++并不包含一个代码分析器,但是包含一组API(还有例子),用这些就能够做个分析器出来。iCarnegie就做了一个。
解压那个下载的zip包,打开命令行,运行EnableProfiler.Bat。这个批处理文件将在你的注册表里加点东西。然后在同一个命令行里运行你的程序。将有一个output.Log记录分析的数据。用excel打开output.log as a csv (comma separated values)。
Note: In Microsoft Excel 2007, choose the "Data" menu, and click on "From Text". Choose Semicolon as the delimiter.
There are nine columns in the csv file:
1 Thread ID: The thread under which the function executed
2 Function: Name of the function
3 Times Called: The number of times the function was called
4 Exclusive Time: Amount of time (in seconds) spent in the function excluding time spent in its callees, Suspended Time and Profiler Time
5 Callee Exclusive Time: Amount of time (in seconds) spent in the function and its callees (children) excluding Suspended Time and Profiler Time
6 Inclusive Time: Amount of time (in seconds) spent in the function including Callee Time, Suspended Time and Profiler Time
7 Callee Time: Amount of time (in seconds) spent in the callees (including Suspended Time and Profiler Time spent under the callees)
8 Suspended Time: Amount of suspended time (in seconds)
9 Profiler Time: Amount of time spent by profiler (in seconds)
Clean the profiler data file: Sometimes you will see some functions such as static void System.AppDomain::OnExitProcess( ) that runs under a different Thread ID. You might see that function included as part of your output.log file (typically as the last few rows). If you see those function listed, delete all rows corresponding to that thread. Also, delete all rows that are empty.
要计算the fraction of time spent in each function (as a percentage of the total time spent across all functions) in column J, perform the following sequence of actions:
1 在J1敲一个恰当的名字(比如Function%)
2 把鼠标停在 J2
3 输入这个公式: =D2/SUM(D:D)
4 把这个公式复制到J的这一列
5 用百分比表示右键点击J列的比率,然后选择 "Format Cells...", 选"Number" tab, 然后specify "Percentage" as the Category.
要计算the fraction of time spent by each function and its callees (as a percentage of the total time spent by the program) in column K, 照着下面的做:
6 在K1敲个恰当的名字(比如Function+Child%).
7 降序排列E 列(Callee Exclusive Time). 这会让main function跑到第一个.
8 把E2标记为TotalTime (since cell E2 gives the time spent in main and its callees) by doing the following:
8 鼠标放在E2
8 选Insert > Name > Define. (In Excel 2007, choose "Formulas" menu, click "Define Name")
8 Type TotalTime as the name
8 click OK
9 选中K2
10 输入 =E2/TotalTime
11 把上面那公式复制到K列的每一格
12 在K列上右键,用百分数表示比率。然后选 "Format Cells...", 选 "Number" tab, 然后specify "Percentage" as the Category.
Now you can examine where your program spends significant amounts of time by sorting the data by Function% or Function+Child%