Individual Project-——word_frequency——final requirement

 

Implement a console application to tally the frequency of words under a directory (2 modes).

 

For all text files under a directory (recursively) (file extensions: "txt",  "cpp", "h", “cs”),    calculate the frequency of each word, and output the result into a text file.  Write the code in C++ or C#, using .Net Framework,  the running environment is 32-bit Win7 or  WinVista.

 

Run performance analysis tool on your code, find performance bottlenecks and improve.

 

Enable Code Quality Analysis for your code and get rid of all warnings.

Code Quality Analysis:  http://msdn.microsoft.com/en-us/library/dd264897.aspx 

 

 

 

Write  10 simple test cases to make sure your program can handle these cases correctly (e.g.  a good test case could be: one of the sub-directories is empty).

 

Submission:

  • Submit your source code and exe to TA, TA will run it on his testing environment and check for
    • correctness   (incorrect program will get 0 points)
    • performance
    • write a blog (see blog requirement below)

 

Definition:

  • A word: a string with at least 3 English alphabet letters, then followed by optional alphanumerical characters.  Words are separated by delimiters. If a string contains non-alphanumerical characters, it’s not a word.   Word is case insensitive,  i.e. “file”, “FILE” and “File” are considered the same word.

“hao123” is a word,  and “123hao” is NOT a word.

 

  • Alphabetic letters:  A-Z, a-z.
  • Alphanumerical characters: A-Z, a-z, 0-9.
  • Delimiter: space, non-alphanumerical letters (,.<>|\)[]{!@#$%^&*()_+=-}”).
  • Output text file: filename is <your email name>.txt
    • Each line has this format

<word>: number

 

Where <word> is the string,  it has to be the exact upper/lower case as shown in the text file.  E.g. if only “File” and “file” appear in the test cases,  the program should not show “FILE”.

 

        

Where “number” is the number of times this word appears in the scan.  The output should be sorted with most frequently word first.  If 2 words have the same frequency, list the words by dictionary order.

 

Requirements:

1)     Simple mode.   Output simple word frequency.

Myapp.exe <directory-name>

Will output <your-name>.txt file in current directory,  the text file contains word ranking list.

2)    Extended mode. 

This only applies to some special cases of words.   If 2 words are different only in the ending numbers,  we think they are the same number.  For example, we consider “win”, “win95” and “win7” are ONE WORD;  “Office” and “Office15” are the same;  “iPhone4” and “Iphone5” are the same word.   “win”  and “win32a” are DIFFERENT words, as the difference are more than just ending numbers. 

 

When running with “-e” command line parameter,

Myapp.exe –e <directory-name>

 

The app will output <your-name>.txt file  in current directory,  the text file contains word ranking list, but the frequency is calculated based on the extended mode definition.  

 

In extended mode, the output <word> should be the first word in dictionary order, among all matching results, e.g., if we have “win95”, “win98” “win2000” in the files,  “win2000” should be displayed.

Blog Requirement:

You can publish this to BOTH your own blog, and your team blog (to help your team blog get some traffic)

 

1)      Before you implement this project,  Record your estimate about the time you WILL spend in each component of your program.

2)      After you had implemented this project, record the ACTUAL time you spent in each component of your program.

3)      Describe how much time you spent on improving the performance of your program,  and show a performance analysis graph (generated by VS2012 perf analysis tool),  if possible, please show the most costly function in your program.

4)      Share your 10 test cases, and how did you make sure your program can produce the correct result. (programs with incorrect result will get 0 points,  regardless of speed)

5)      Describe what you had learned in this exercise. 

转载于:https://www.cnblogs.com/codingcook/archive/2012/09/25/2701416.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值