Solr

最新推荐文章于 2020-05-29 21:52:27 发布

java小小小萌新

最新推荐文章于 2020-05-29 21:52:27 发布

阅读量95

点赞数

分类专栏： Java 文章标签： Java

本文链接：https://blog.csdn.net/weixin_45726045/article/details/103208014

版权

Java 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一. solr的引入

1 使用模糊查询
在这里插入图片描述
在搜索一个商品时,如果使用的是模糊查询:

Select * from goods where goods_name like “%手机%”;

性能问题：模糊查询因为无法使用索引，导致它的查询速度非常慢！
但是我们给goods_name 新建索引了，为什么没有使用到？
索引：本质就是一颗二叉树
在这里插入图片描述
根据最左匹配的原则，你的左边不能使用% 表示所有！若你的前缀是 % 则它无法做比较和排序

2 使用NoSql
Nosql的模型是:Map<K,V> ,这里我们使用redis,key为"手机",value为所有手机商品的list集合

{“手机”:List<Goods>}

该查询相当于利用一个关键字查询整个数据！

3 使用倒排索引
在这里插入图片描述
对输入的商品名称做分词,放在Map<String,List>中,分词是key, 对应的value是商品的id, 查询时根据分词找到商品的id, 然后根据id在商品集合中查找商品
分词的网站：http://www.pullword.com/
代码实现:
JiebaSegmenter : 表示jieba分词,一种分词方式,使用时需要导入jieba分词的jar包

public class ImportServiceImpl implements ImportService {

	 //GOODS :   key: 商品的id		value: 对应的商品
	private static Map<Long,Goods> GOODS = new HashMap<>();
	//GOODS_INDEX ：key: 分词关键字  	value: List<Long> 商品的id 集合
	private static Map<String,List<Long>> GOODS_INDEX= new HashMap<>();
	//jieba分词
	private static JiebaSegmenter ANALYSIS = null ;
	static {
		ANALYSIS = new JiebaSegmenter() ;
	}
	/**
	 * 商品的导入	在导入之前我们需要建立倒排索引的列表GOODS_INDEX
	 */
	@Override
	public void importGoods(Goods goods) {
		//校验数据
		if(goods==null||goods.getId()==null||goods.getGoodsName()==null) {
			throw new RuntimeException("商品的数据不完整");
		}
		//将商品名称分词处理,得到分词集合
		List<SegToken> keywordTokens = ANALYSIS.process(goods.getGoodsName(), SegMode.INDEX);
		for (SegToken segToken : keywordTokens) {
			//如果倒排索引列表中存在该分词,则将商品id添加进去
			if(GOODS_INDEX.containsKey(segToken.word)) {
				List<Long> ids = GOODS_INDEX.get(segToken.word);
				ids.add(goods.getId());
			}else {
			//如果不存在该分词, 新建一个存放id的集合, 将该分词和集合放入到倒排索引列表中去
				List<Long> ids = new ArrayList<Long>();
				ids.add(goods.getId());
				GOODS_INDEX.put(segToken.word, ids);
			}
		}
		//将商品id和商品放入到商品集合中
		GOODS.put(goods.getId(), goods);
	}
	@Override
	public List<Goods> search(String goodsName) {
		List<SegToken> keyWords = ANALYSIS.process(goodsName, SegMode.SEARCH);
		List<Goods> goodss = new ArrayList<Goods>();
		for (SegToken segToken : keyWords) {
			if(segToken.word.trim().equals("")) {
				continue;
			}
			if(GOODS_INDEX.containsKey(segToken.word)) {
				// hashmap 里面取值： log(n) 二分查找
				List<Long> ids = GOODS_INDEX.get(segToken.word);
				for (Long id : ids) {
					Goods goods = GOODS.get(id);
					goodss.add(goods);
				}
			}
		}
		return goodss;
	}

}

4 Trie树的引入

倒排索引中存在的问题:
在这里插入图片描述
存在关键字里面的内存占用的问题, 利用 Trie树来实现共享前缀

代码实现:

**
 * Trie 树的实现
 * @author WHSXT-LTD
 */
public class TrieNode {

	private char c ;

	private Map<Character, TrieNode> children;
	
	private List<Long> ids ;

	public TrieNode() {};
	public TrieNode(char c) {
		this.c = c;
	}
	/**
	 * 在trie 里面插入一个节点
	 * keyword： 关键字 
	 *      中
	 *      中 国
	 *      中 间
	 *      中 国 人
	 * 		keyword  要插入的关键字
	 * 		id  对应的id
	 */
	public boolean insert(String keyword,Long id) {
		if(keyword==null||keyword.trim().equals("")) {
			return false ;
		}
		return insert(keyword,0,id);
	}
	/*
	*  pos  关键字中每一个字符的索引,默认从0开始
	*/
	private boolean insert(String keyword, int pos,Long id) {
		if(pos>=keyword.length()) {
			return false ;
		}

		if(this.children==null) {
			this.children = new HashMap<Character, TrieNode>();
		}
		//获得keyword索引为pos的char
		Character character = new Character(keyword.charAt(pos));
		//默认Trie的一级树是空树
		TrieNode trieNode = this.children.get(character);
		if(trieNode==null) { //在Trie中不存在该节点
			trieNode = new TrieNode(keyword.charAt(pos)); // 新建一个节点
			//一级树下的子树
			this.children.put(character, trieNode);
		}
		if(pos==keyword.length()-1) {
			if(trieNode.ids==null) {
				trieNode.ids = new ArrayList<Long>();
			}
			//给这个关键字对应的id集合添加数据
			trieNode.ids.add(id);
			return true;
		}else {
			// 给子节点里面插入值, 递归调用insert方法,索引加一
			return trieNode.insert(keyword,pos+1,id);
		}
	}
	/**
	 * 在trie 树里面搜索一个节点
	 */
	public TrieNode find(String keyword) {
		return find(keyword,0);
	}

	//  pos :索引从0开始
	private TrieNode find(String keyword, int pos) {
		if(this.children==null) {
			return null ;
		}
		TrieNode node = this.children.get(keyword.charAt(pos));
		if(node==null) {
			return null ;
		}else {
			if(pos==keyword.length()-1) {
				return node;
			}else {
				return node.find(keyword, pos+1);
			}
		}

商品查询中使用Trie树

public class ImportServiceImpl implements ImportService {

	private static Map<Long,Goods> GOODS = new HashMap<>();

	private static TrieNode GOODS_INDEX = new TrieNode();

	private static JiebaSegmenter ANALYSIS = null ;
	static {
		ANALYSIS = new JiebaSegmenter() ;
	}
	/**
	 * 商品的导入
	 */
	@Override
	public void importGoods(Goods goods) {
		if(goods==null||goods.getId()==null||goods.getGoodsName()==null) {
			throw new RuntimeException("商品的数据不完整");
		}
		List<SegToken> terms = ANALYSIS.process(goods.getGoodsName(), SegMode.INDEX);
		for (SegToken segToken : terms) {
			if(segToken.word.trim().equals("")) {
				continue ;
			}
			GOODS_INDEX.insert(segToken.word, goods.getId());
		}
		GOODS.put(goods.getId(), goods);
	}
	
	@Override
	public List<Goods> search(String goodsName) {
		if(goodsName==null||goodsName.trim().equals("")) {
			throw new RuntimeException("商品的数据不完整");
		}
		List<Goods> goodss = new ArrayList<Goods>();
		List<SegToken> terms = ANALYSIS.process(goodsName, SegMode.SEARCH);
		for (SegToken segToken : terms) {
			if(segToken.word.trim().equals("")) {
				continue ;
			}
			TrieNode node = GOODS_INDEX.find(segToken.word);
			List<Long> ids = node.getIds();
			if(ids!=null&& !ids.isEmpty()) {
				for (Long id : ids) {
					Goods goods = GOODS.get(id);
					goodss.add(goods);
				}
			}
		}
		return goodss;
	}

}

五 FST和Trie树
Trie ：只能共享前缀，说明性能还可以进一步的优化
FST：又Trie 演变而来，它还能共享后缀
http://lucene.apache.org/底层就是使用的FST

六 Solr的数据导入

(多线程)全量导入
需要使用@PostConstruct, 项目启动时执行该注解注释的方法,执行第一次数据的全量导入
(多线程)增量导入
定时导入,一般是每天晚上导入一次数据,针对一些不重要的属性,实时性不高
需要两个时间戳,比如昨天晚上12点和今天晚上12点
Solr在导入数据时,会阻塞,用户在此期间无法搜索
实时导入(使用MQ)
针对一些重要的属性,比如商品的价格,库存,要求实时性高
这些数据量非常小,solr导入的速度很快,不会影响到用户使用

java小小小萌新

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Solr

一. solr的引入1 使用模糊查询在搜索一个商品时,如果使用的是模糊查询:Select * from goods where goods_name like “%手机%”;性能问题：模糊查询因为无法使用索引，导致它的查询速度非常慢！但是我们给goods_name 新建索引了，为什么没有使用到？索引：本质就是一颗二叉树根据最左匹配的原则，你的左边不能使用% 表示所有！若你的前缀...
复制链接

扫一扫