大数据正式京淘正式14
传统的检索方式
1.文本检索/windows检索
- 全文检索、全文遍历
- 加载到内存中
- 缺点:数据一多,无法高效查询
2.数据库中的检索
- select * from tb where name like '%X%';
- 问题
- 数据量庞大,难以存储
- like查询效率低
现在的全文检索
- 跟磁盘IO有关
- Btree索引
- Lucene检索引擎工具包
- 图示过程
- 技术
- 倒排索引
- 关键:分词--不能再拆分、document具体的键值对
- 分词重复--索引好合并
- 标注了文件的位置
- 标注了出现次数
- 定位方式:由分词定位到整个数据
- 关键:分词--不能再拆分、document具体的键值对
- 倒排索引
Lucene
- 搜索引擎的底层工具包,Lucene可以提供创建索引的API
-
依赖
-
测试版本:4.10.2
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core --> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>4.10.2</version> </dependency>
-
京淘:5.2.1
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core --> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>5.2.1</version> </dependency>
-
-
步骤
- 创建文件对象
// 创建文件对象 Document doc=new Document(); // 模拟数据索引--YES:允许被使用找到的数据 doc.add(new LongField("id",120119,Store.YES)); doc.add(new TestField("title","华为麦芒5",Stroe.YES)); doc.add(new DoubleField("price",2345D,Store.YES)); doc.add(new StringFied("img","http://com.img.jt/aa.png",Store.YES));
-
利用分词创建索引,创建完分词可以利用分词查看器查看
Directory dir=FSDirectory.open(new FIle(./index)); // 标准分词器 Analyzer analyzer=new StanderdAnalyzer(); // 调用索引config对象索引的写出参数设定 IndexWriterConfig config=new IndexWriterConfig(Version.LUCENE_4_10_2,analyzer); IndexWriter writer=new IndexWriter(dir,config); writer.addDocument(doc);
-
关闭流
writer.close(); dir.close();
- 创建文件对象
在索引中查询
- 不同的文件搜索结果不同
Directory dir=FSDirectory.open(new File(".index")); IndexSeracher searcher =new IndexSearch(IndexReader.open(dir)); Query query =new TerQuery(new Term("title","啊")); TopDocs topDocs=search.search(query,10);// 分页 for(ScoreDoc scoreDoc:topDocs.scoreDocs){ System.out.print("得分"+scoreDoc.score); // 准备遍历 Document document searcher.doc(scoreDoc.doc); System.out.print(documet.get); }
Solr
- 基于Lucene的搜索服务系统,solr可以引入数据的工具--生成索引
- 包在Lucene外的一层壳
- 示例
- 生成jtdb的索引
- 查询【“title”:“三星”】
{ "responseHeader": { "status": 0, "QTime": 11, "params": { "q": "title:三星", "indent": "true", "wt": "json", "_": "1517312970678" } }, "response": { "numFound": 388, "start": 0, "docs": [ { "image": [ "http://image.jt.com/jd/8544c3283b324d3691c0c6f0068e8648.jpg" ], "price": 115000, "created": "2015-03-08T21:28:01Z", "num": 99999, "id": 1109730, "title": "三星 E1200R 黑色 移动联通2G", "updated": "2015-03-08T21:28:01Z", "_version_": 1591017924469981200 }, { "image": [ "http://image.jt.com/jd/cddc143c8614435282be89015b130ce4.jpg" ], "price": 829000, "created": "2015-03-08T21:28:09Z", "num": 99999, "id": 1145177, "title": "三星 G3586V 白色 联通4G手机", "updated": "2015-03-08T21:28:09Z", "_version_": 1591017924501438500 }, { "image": [ "http://image.jt.com/jd/6adf501eb5f8486097ab945428d96803.jpg" ], "price": 766000, "created": "2015-03-08T21:27:49Z", "num": 99999, "id": 1170772, "title": "三星 SM-G3568V 白色 移动4G手机", "updated": "2015-03-08T21:27:49Z", "_version_": 1591017924517167000 }, { "image": [ "http://image.jt.com/jd/f04c8974fdf3491bb2dcbf5683f5e303.jpg" ], "price": 1399000, "created": "2015-03-08T21:27:54Z", "num": 99999, "id": 1186132, "title": "三星 I9158V 炭蓝 移动4G手机", "updated": "2015-03-08T21:27:54Z", "_version_": 1591017924533944300 }, { "image": [ "http://image.jt.com/jd/50013d1209104b9da40056658578c2ef.jpg" ], "price": 1359000, "created": "2015-03-08T21:27:54Z", "num": 99999, "id": 1270603, "title": "三星 SM-G5108 白色 移动4G手机", "updated": "2015-03-08T21:27:54Z", "_version_": 1591017924653482000 }, { "image": [ "http://image.jt.com/jd/a91d08e216404d99b2c12ad48e2ce6d8.jpg" ], "price": 1358000, "created": "2015-03-08T21:28:09Z", "num": 99999, "id": 1280796, "title": "三星 SM-G5108 炭灰 移动4G手机", "updated": "2015-03-08T21:28:09Z", "_version_": 1591017924741562400 }, { "image": [ "http://image.jt.com/jd/642927742e1849a38b1dabaafda52cff.jpg" ], "price": 838000, "created": "2015-03-08T21:27:54Z", "num": 99999, "id": 1282430, "title": "三星 G3608 炭灰 移动4G手机", "updated": "2015-03-08T21:27:54Z", "_version_": 1591017924744708000 }, { "image": [ "http://image.jt.com/jd/2f3bc0b09b214cf8bd6d639202189613.jpg" ], "price": 838000, "created": "2015-03-08T21:27:49Z", "num": 99999, "id": 1282431, "title": "三星 G3608 白色 移动4G手机", "updated": "2015-03-08T21:27:49Z", "_version_": 1591017924744708000 }, { "image": [ "http://image.jt.com/jd/3b5801fa22524a44924b01643805cc56.jpg" ], "price": 1399000, "created": "2015-03-08T21:27:42Z", "num": 99999, "id": 1284030, "title": "三星 I9158V 白色 移动4G手机", "updated": "2015-03-08T21:27:42Z", "_version_": 1591017924744708000 }, { "image": [ "http://image.jt.com/jd/c82fb80f985440cebd98abe879a1515d.jpg" ], "price": 299000, "created": "2015-03-08T21:31:36Z", "num": 99999, "id": 1295910, "title": "三星(SAMSUNG) I699I 白色 电信3G手机", "updated": "2015-03-08T21:31:36Z", "_version_": 1591017924750999600 } ] } }
- 生成jtdb的索引
京淘中的搜索
-
配置文件
- solr连接属性
SOLR.URL=http://solr.jt.com/solr/jt
-
spring整合配置
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:context="http://www.springframework.org/schema/context" xmlns:p="http://www.springframework.org/schema/p" xmlns:aop="http://www.springframework.org/schema/aop" xmlns:tx="http://www.springframework.org/schema/tx" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.0.xsd http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-4.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-4.0.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-4.0.xsd"> <bean id="httpSolrServer" class="org.apache.solr.client.solrj.impl.HttpSolrServer"> <constructor-arg index="0" value="${SOLR.URL}"/> <!-- 设置响应解析器,solrj没有提供json解析器,所以通常用xml解析器 --> <property name="parser"> <bean class="org.apache.solr.client.solrj.impl.XMLResponseParser"/> </property> <!-- 设置重试次数,推荐设置为1 --> <property name="maxRetries" value="1"/> <!-- 建立连接的最长时间 ,单位是:毫秒--> <property name="connectionTimeout" value="500"/> </bean> </beans>
- solr连接属性
-
Controller层
package com.peng.controller; import java.io.UnsupportedEncodingException; import java.util.List; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.RequestMapping; import com.peng.pojo.Item; import com.peng.service.SearchService; @Controller("searchController") public class SearchController { @Autowired @Qualifier("searchService") private SearchService searchService; @RequestMapping("/search") public String search(Model model, String q, Integer page) { try { // 解决乱码 q = new String(q.getBytes("ISO8859-1"), "UTF-8"); Integer rows = 20; List<Item> itemList = searchService.queryItemList(q, page, rows); model.addAttribute("itemList", itemList); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return "search"; } }
-
Service层
-
接口
package com.peng.service; import java.util.List; import com.peng.pojo.Item; public interface SearchService { /** * 搜索商品 * * @param q * @param page * @param rows * @return */ List<Item> queryItemList(String q, Integer page, Integer rows); }
-
实现类
package com.peng.service.impl; import java.util.List; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.impl.HttpSolrClient; import org.apache.solr.client.solrj.response.QueryResponse; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import com.peng.pojo.Item; import com.peng.service.SearchService; @Service("searchService") public class SearchServiceImpl implements SearchService { @Autowired private HttpSolrClient client; @Override public List<Item> queryItemList(String q, Integer page, Integer rows) { if (null == page) { page = 1; } // 起始位置 Integer start = Math.max(page, 1); SolrQuery query = new SolrQuery(); query.setQuery("title:" + q); query.setStart(start); query.setRows(rows); // 连接数据 try { QueryResponse response = client.query(query); List<Item> itemList = response.getBeans(Item.class); return itemList; } catch (Exception e) { e.printStackTrace(); return null; } } }
-
京淘接近尾声了,回顾一下
- 京淘整体项目--这个思路、思想是重点,其他为辅
- 前台、后台分离
- 项目横向、纵向分离
- 跨域访问
- 电商设计思路:高并发、高可用、流程
- 云服务器部署
- EasyUI
- 页面显示技术
- json、jsonp
- NGINX
- 路径转换
- 负载均衡
- Redis
- 主从复制
- 哨兵
- 集群
- 模拟访问网站
- HttpClient
- 数据库
- 主从复制
- Amoeba
- 读写分离
- 队列
- RabbitMQ
- 定时任务
- Quartz
- 检索
- Lucene
- Solr