倒排索引的java实现

最新推荐文章于 2024-07-09 02:57:46 发布

Jipon

最新推荐文章于 2024-07-09 02:57:46 发布

阅读量1.2w

点赞数 11

分类专栏：搜索引擎文章标签：倒排索引搜索引擎

本文链接：https://blog.csdn.net/chen19920219/article/details/71091314

版权

搜索引擎专栏收录该内容

1 篇文章 0 订阅

订阅专栏

假设有3篇文章，file1, file2, file3，文件内容如下：

    文件内容代码   
    
 file1 (单词1，单词2，单词3，单词4....)  
   
 file2 (单词a，单词b，单词c，单词d....)  
   
 file3 (单词1，单词a，单词3，单词d....)

那么建立的倒排索引就是这个样子：

    文件内容代码   
    
 单词1 (file1,file3)  
   
 单词2 (file1)  
   
 单词3 (file1,file3)  
   
 单词a (file2, file3)  
   
 ....

而词频就是每个单词在文件中出现的相应次数，本文计算的是每个单词在所有文件中出现的总次数，如果有更简洁有效的写法，欢迎交流。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;


public class IntertedIndex {
	
	private Map<String, ArrayList<String>> map=new HashMap<>();
	private ArrayList<String> list;
	private Map<String, Integer> nums=new HashMap<>();
	
	public void CreateIndex(String filepath){

		String[] words = null;
		try {
		
			File file=new File(filepath);
			BufferedReader reader=new BufferedReader(new FileReader(file));
			String s=null;
			while((s=reader.readLine())!=null){
				//获取单词
				words=s.split(" ");
				
			}
			
			for (String string : words) {
			
				if (!map.containsKey(string)) {
					list=new ArrayList<String>();
					list.add(filepath);
					map.put(string, list);
					nums.put(string, 1);
				}else {
					list=map.get(string);
					//如果没有包含过此文件名，则把文件名放入
					if (!list.contains(filepath)) {
						list.add(filepath);
					}
					//文件总词频数目
					int count=nums.get(string)+1;
					nums.put(string, count);
				}
			}
			reader.close();
			
		} catch (IOException e) {
			
			e.printStackTrace();
		}
	
		
	}
	public static void main(String[] args) {
		IntertedIndex index=new IntertedIndex();
		
		for(int i=1;i<=3;i++){
			String path="E:\\data\\"+i+".txt";
			index.CreateIndex(path);
		}
		for (Map.Entry<String, ArrayList<String>> map : index.map.entrySet()) {
			System.out.println(map.getKey()+":"+map.getValue());
		}

		for (Map.Entry<String, Integer> num : index.nums.entrySet()) {
			System.out.println(num.getKey()+":"+num.getValue());
		}
	}
}

文件内容：

1.txt：i live in hangzhou where are you

2.txt：i love you i love you

3.txt：i love you today is a good day

运行结果

Jipon

关注

11
点赞
踩
69

收藏

觉得还不错? 一键收藏
4
评论
倒排索引的java实现

假设有3篇文章，file1, file2, file3，文件内容如下：文件内容代码 file1 (单词1，单词2，单词3，单词4....) file2 (单词a，单词b，单词c，单词d....) file3 (单词1，单词a，单词3，单词d....) 那么建立的倒排索引就是这个样子：
复制链接

扫一扫

专栏目录