原题网址:http://www.lintcode.com/en/problem/inverted-index/
Create an inverted index with given documents.
Notice
Ensure that data does not include punctuation.
Example
方法:使用哈希影射。
Given a list of documents with id and content. (class Document)
[
{
"id": 1,
"content": "This is the content of document 1 it is very short"
},
{
"id": 2,
"content": "This is the content of document 2 it is very long bilabial bilabial heheh hahaha ..."
},
]
Return an inverted index (HashMap with key is the word and value is a list of document ids).
{
"This": [1, 2],
"is": [1, 2],
...
}
/**
* Definition of Document:
* class Document {
* public int id;
* public String content;
* }
*/
public class Solution {
/**
* @param docs a list of documents
* @return an inverted index
*/
public Map<String, List<Integer>> invertedIndex(List<Document> docs) {
// Write your code here
Map<String, Set<Integer>> map = new HashMap<>();
for(Document doc : docs) {
String[] strs = doc.content.split(" ");
for(String str : strs) {
if ("".equals(str)) continue;
update(str, doc.id, map);
}
}
Map<String, List<Integer>> results = new HashMap<>();
for(Map.Entry<String, Set<Integer>> entry : map.entrySet()) {
List<Integer> list = new ArrayList<>(entry.getValue());
Collections.sort(list);
results.put(entry.getKey(), list);
}
return results;
}
private void update(String str, int id, Map<String, Set<Integer>> results) {
Set<Integer> set = results.get(str);
if (set == null) {
set = new HashSet<>();
results.put(str, set);
}
set.add(id);
}
}

本文介绍了一种利用哈希映射实现倒排索引的方法,该方法通过处理一系列文档内容,去除标点符号,并建立单词到文档ID列表的映射,从而实现高效的文本检索功能。
2558

被折叠的 条评论
为什么被折叠?



