zju 1050 解题报告 字符串

本题是从搜索引擎中根据单词在用户的查询串中出现的情况计算文档权重这个问题简化出来的,解题思路比较简单,关键是理解题意并且熟练运用STL中的map容器。

题目如下:

Start Up the Startup

Time limit: 1 Seconds   Memory limit: 32768K  
Total Submit: 805   Accepted Submit: 182  

Clearly the economy is bound to pick up again soon. As a forward-thinking Internet entrepreneur, you think that the 'Net will need a new search engine to serve all the people buying new computers. Because you're frustrated with the poor results most search engines produce, your search engine will be better.

You've come up with what you believe is an innovative approach to document matching. By giving weight to the number of times a term appears in both the search string and in the document being checked, you believe you can produce a more accurate search result.

Your program will be given a search string, followed by a set of documents. You will calculate the score for each document and print it to output in the order the document appears in the input. To calculate the score for a document you must first calculate the term score for each term appearing in the search string. A term score is the number of times a term occurs in the search string multiplied by the number of times it occurs in the document. The document score is the sum of the square roots of each term score.


Input Format:

The input consists of a set of documents separated by single lines containing only ten dashes, “----------”. No line will be longer than 250 characters. No document will be longer than 100 lines. The first document is the search string. The input terminates with two lines of ten dashes in a row.

The input documents will use the full ASCII character set. You must parse each document into a set of terms.

Terms are separated by whitespace in the input document. Comparisons between terms are case-insensitive. Punctuation is removed from terms prior to comparisons, e.g. “don't” becomes “dont”. The resulting terms should contain only the characters {[a-z],[0-9]}. A term in the input consisting only of punctuation should be ignored. You may assume the search string and each document will have at least one valid term.


Output Format:

The output is a series of scores, one per line, printed to two decimal places. The scores are printed in the order the documents occur in the input. No other characters may appear in the output.


This problem contains multiple test cases!

The first line of a multiple input is an integer N, then a blank line followed by N input blocks. Each input block is in the format indicated in the problem description. There is a blank line between input blocks.

The output format consists of N output blocks. There is a blank line between output blocks.


Sample Input:

1

fee fi fo fum
----------
fee, fi, fo! fum!!
----------
fee fee fi, me me me
----------
----------


Sample Output:


4.00
2.41

解题思路是首先为查询串中的单词建立哈希表,记录单词在查询串中出现的次数。然后处理文档时,为文档中的单词建立哈希表,记录单词在文档中出现的次数。每处理一个单词,计算该单词的权重(单词在查询串中出现的次数*当前单词在文档的哈希表中出现的次数)。最后将文档中在查询串中出现的单词的权重进行累加,即文档的权重。

哈希表采用STL的map容器来实现。

源代码:

#include <iostream>
#include<iomanip>
#include <map>
#include <string>
using namespace std;
#include <string.h>
#include <math.h>
#define MAXLINELENGTH 250
#define MAXTERMCNT 100
#define SEPLINE "----------"
inline bool validTermChar(char c)
{
 return c>='a' && c<='z' || c>='A' && c<='Z' || c>='0' && c<='9';
}
inline char toLower(char c)
{
 char temp = c;
 if(temp >= 'A' && temp <= 'Z')
  temp += ('a'-'A');
 return temp;
}
void calculateScore()
{
 map<string,int> searchTermMap;
 string term;
 char termStr[MAXLINELENGTH+1];
 while(true)
 {
  cin>>termStr;
  if(!strcmp(termStr,SEPLINE))
   break;
  term = "";
  int len = strlen(termStr);
  for(int i=0;i<len;i++)
  {
   if(validTermChar(termStr[i]))
   {
    termStr[i] = toLower(termStr[i]);
    term += termStr[i];
   }
  }
  if(term.length() > 0)
  {
   map<string,int>::iterator itr = searchTermMap.find(term);
   
   if(itr != searchTermMap.end())
   {
    (itr->second)++;
   }
   else
   {
    searchTermMap.insert(map<string,int>::value_type(term,1));
   }
  }
 }
 bool tenLineDash = false;
 map<string,int> docTermMap;
 docTermMap.clear();
 while(true)
 {
  cin>>termStr;
  if(!strcmp(termStr,SEPLINE))
  {
   if(tenLineDash)
    break; 
   tenLineDash = true;
   double docScore = 0.0;
   for(map<string,int>::iterator itr1  = docTermMap.begin();
                        itr1 != docTermMap.end();
         ++itr1)
         {
          docScore = docScore + sqrt((double)(itr1->second));
         }
   cout<<setiosflags(ios::fixed);  
            cout<<setprecision(2)<<docScore<<endl;
   docTermMap.clear();
   continue;
  }
  tenLineDash = false;
        term = "";
  int len = strlen(termStr);
  for(int i=0;i<len;i++)
  {
   if(validTermChar(termStr[i]))
   {
    termStr[i] = toLower(termStr[i]);
    term += termStr[i];
   }
  }
  map<string,int>::iterator searchItr;
  if(term.length() > 0 && (searchItr = searchTermMap.find(term)) != searchTermMap.end())
  {
   map<string,int>::iterator itr = docTermMap.find(term);
   int scnt = searchItr->second;
   if(itr != docTermMap.end())
   {
    (itr->second) += scnt;  
   }
   else
   {
    docTermMap.insert(map<string,int>::value_type(term,1*scnt));
   }
  }
 }
}
int main(int argc,char **argv)
{
 int testCase;
 cin>>testCase;
 for(int i=0;i<testCase;i++)
 {
  calculateScore();
  if(i < testCase - 1)
   cout<<endl;
 }
 return 0;
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值