Elasticsearch—jd实战
此项目为了训练ElasticSearch的熟练度,通过爬虫获取jd的数据,完成数据库查询
本章资料: https://pan.baidu.com/s/1fu_KHu5VCBKorbgJgbvVwA 提取码:3ij8
一、创建Springboot项目
- 略
二、编写代码
1、导入依赖
-
<properties> <java.version>1.8</java.version> <elasticsearch.version>7.8.0</elasticsearch.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-devtools</artifactId> <scope>runtime</scope> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-configuration-processor</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <!-- jsoup解析页面 --> <!-- 解析网页 爬视频可 研究tiko --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.2</version> </dependency> <!-- fastjson --> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.70</version> </dependency> </dependencies>
2、导入前端素材
3、编写application.preperties
配置文件
-
#端口号 server.port=8011 #关闭网页缓存 spring.thymeleaf.cache=false
4、编写IndexController、测试index
-
@Controller public class IndexController { @GetMapping({"/", "index"}) public String index(){ return "index"; } }
5、编写爬虫
-
分析京东搜索页面
http://search.jd.com/search?keyword=java
- 审查页面元素
- 页面列表id:J_goodsList
-
爬取数据(获取请求返回的页面信息,筛选出可用的)
-
创建HtmlParseUtil,并简单编写
-
public class HtmlParseUtil { public static void main(String[] args) throws IOException { /// 使用前需要联网 // 请求url String url = "http://search.jd.com/search?keyword=java"; // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象) Document document = Jsoup.parse(new URL(url), 30000); // 使用document可以使用在js对document的所有操作 // 2.获取元素(通过id) Element j_goodsList = document.getElementById("J_goodsList"); // 3.获取J_goodsList ul 每一个 li Elements lis = j_goodsList.getElementsByTag("li"); // 4.获取li下的 img、price、name for (Element li : lis) { String img = li.getElementsByTag("img").eq(0).attr("src");// 获取li下 第一张图片 String name = li.getElementsByClass("p-name").eq(0).text(); String price = li.getElementsByClass("p-price").eq(0).text(); System.out.println("======================="); System.out.println("img : " + img); System.out.println("name : " + name); System.out.println("price : " + price); } } }
-
一般图片特别多的网站,所有的图片都是通过延迟加载的
-
// 打印标签内容 Elements lis = j_goodsList.getElementsByTag("li"); System.out.println(lis);
-
打印所有li标签,发现img标签中并没有属性src的设置,只是data-lazy-ing设置图片加载的地址
-
-
实现爬取JD数据
-
创建实体类
-
@Data @AllArgsConstructor @NoArgsConstructor public class Content implements Serializable { private String name; private String img; private String price; }
-
-
封装工具栏
-
public class HtmlParseUtil { public static void main(String[] args) throws IOException { System.out.println(parseJD("java")); } public static List<Content> parseJD(String keyword) throws IOException { /// 使用前需要联网 // 请求url String url = "http://search.jd.com/search?keyword=" + keyword; // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象) Document document = Jsoup.parse(new URL(url), 30000); // 使用document可以使用在js对document的所有操作 // 2.获取元素(通过id) Element j_goodsList = document.getElementById("J_goodsList"); //j_goodsList 如果这里为空 用下面这个方法 //Document document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 5.1; zh-CN) AppleWebKit/535.12 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/535.12").timeout(30000).get(); // 3.获取J_goodsList ul 每一个 li Elements lis = j_goodsList.getElementsByTag("li"); // System.out.println(lis); // 4.获取li下的 img、price、name // list存储所有li下的内容 List<Content> contents = new ArrayList<Content>(); for (Element li : lis) { // 由于网站图片使用懒加载,将src属性替换为data-lazy-img String img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");// 获取li下 第一张图片 String name = li.getElementsByClass("p-name").eq(0).text(); String price = li.getElementsByClass("p-price").eq(0).text(); // 封装为对象 Content content = new Content(name,img,price); // 添加到list中 contents.add(content); } // System.out.println(contents); // 5.返回 list return contents; } }
-
操作响应结果:
-
-
6、编写config
-
@Configuration public class ElasticSearchConfig { @Bean public RestHighLevelClient restHighLevelClient(){ RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("127.0.0.1", 9200, "http") ) ); return client; } }
7、编写Service
-
@Service public class ContentService { @Autowired private RestHighLevelClient client; // 1、解析数据放入 es 索引中 public Boolean parseContent(String keyword) throws IOException { // 获取内容 List<Content> contents = HtmlParseUtil.parseJD(keyword); // 内容放入 es 中 BulkRequest request = new BulkRequest(); request.timeout("2m"); // 可更具实际业务是指 for (int i = 0; i < contents.size(); i++) { request.add( new IndexRequest("jd_goods") .id(""+(i+1)) .source(JSON.toJSONString(contents.get(i)), XContentType.JSON) ); } BulkResponse responses = client.bulk(request, RequestOptions.DEFAULT); return !responses.hasFailures(); } // 2、根据keyword分页查询结果 public List<Map<String, Object>> search(String keyword, Integer pageIndex, Integer pageSize) throws IOException { if (pageIndex < 0){ pageIndex = 0; } //高级查询-请求对象 SearchRequest request = new SearchRequest("jd_goods"); // 构建查询的请求体 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); // 构建精确查询请求——>通过keyword查字段name TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword); //把精确查询请求放入请求体 searchSourceBuilder.query(termQueryBuilder); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 60s // 分页 searchSourceBuilder.from(pageIndex); searchSourceBuilder.size(pageSize); // 高亮 HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("name"); highlightBuilder.preTags("<span style='color:red'>"); highlightBuilder.postTags("</span>"); searchSourceBuilder.highlighter(highlightBuilder); request.source(searchSourceBuilder); SearchResponse searchResponse = client.search(request, RequestOptions.DEFAULT); // 解析结果 ========== SearchHits hits = searchResponse.getHits(); List<Map<String, Object>> results = new ArrayList<>(); for (SearchHit documentFields : hits.getHits()) { // 使用新的字段值(高亮),覆盖旧的字段值 Map<String, Object> sourceAsMap = documentFields.getSourceAsMap(); // 高亮字段 Map<String, HighlightField> highlightFields = documentFields.getHighlightFields(); HighlightField name = highlightFields.get("name"); // 替换 if (name != null){ Text[] fragments = name.fragments(); StringBuilder new_name = new StringBuilder(); for (Text text : fragments) { new_name.append(text); } sourceAsMap.put("name",new_name.toString()); } results.add(sourceAsMap); } return results; } }
8、编写Controller
-
@RestController public class ContentController { @Autowired private ContentService contentService; @GetMapping("/parse/{keyword}") public Boolean parse(@PathVariable("keyword") String keyword) throws IOException { return contentService.parseContent(keyword); } @GetMapping("/search/{keyword}/{pageIndex}/{pageSize}") public List<Map<String, Object>> parse(@PathVariable("keyword") String keyword, @PathVariable("pageIndex") Integer pageIndex, @PathVariable("pageSize") Integer pageSize) throws IOException { return contentService.search(keyword,pageIndex,pageSize); } }
9、测试结果
三、前后端分离(简单使用vue)
1、下载并引入Vue.min.js和axios.js
-
如果安装了nodejs,可以按如下步骤,没有自行下载
-
自行创建一个文件夹打开命令行执行以下代码
-
npm install vue npm install axios
2、在页面引入资源
-
<script th:src="@{/js/vue.min.js}"></script> <script th:src="@{/js/axios.min.js}"></script>
-
前端全代码如下:
-
<!DOCTYPE html> <html xmlns:th="http://www.thymeleaf.org"> <head> <meta charset="utf-8"/> <title>狂神说Java-ES仿京东实战</title> <link rel="stylesheet" th:href="@{/css/style.css}"/> </head> <body class="pg"> <div class="page" id="app"> <div id="mallPage" class=" mallist tmall- page-not-market "> <!-- 头部搜索 --> <div id="header" class=" header-list-app"> <div class="headerLayout"> <div class="headerCon "> <!-- Logo--> <h1 id="mallLogo"> <img th:src="@{/images/jdlogo.png}" alt=""> </h1> <div class="header-extra"> <!--搜索--> <div id="mallSearch" class="mall-search"> <form name="searchTop" class="mallSearch-form clearfix"> <fieldset> <legend>天猫搜索</legend> <div class="mallSearch-input clearfix"> <div class="s-combobox" id="s-combobox-685"> <div class="s-combobox-input-wrap"> <input v-model="keyword" type="text" autocomplete="off" value="dd" id="mq" class="s-combobox-input" aria-haspopup="true"> </div> </div> <button type="submit" @click.prevent="searchKey" id="searchbtn">搜索</button> </div> </fieldset> </form> <ul class="relKeyTop"> <li><a>狂神说Java</a></li> <li><a>狂神说前端</a></li> <li><a>狂神说Linux</a></li> <li><a>狂神说大数据</a></li> <li><a>狂神聊理财</a></li> </ul> </div> </div> </div> </div> </div> <!-- 商品详情页面 --> <div id="content"> <div class="main"> <!-- 品牌分类 --> <form class="navAttrsForm"> <div class="attrs j_NavAttrs" style="display:block"> <div class="brandAttr j_nav_brand"> <div class="j_Brand attr"> <div class="attrKey"> 品牌 </div> <div class="attrValues"> <ul class="av-collapse row-2"> <li><a href="#"> 狂神说 </a></li> <li><a href="#"> Java </a></li> </ul> </div> </div> </div> </div> </form> <!-- 排序规则 --> <div class="filter clearfix"> <a class="fSort fSort-cur">综合<i class="f-ico-arrow-d"></i></a> <a class="fSort">人气<i class="f-ico-arrow-d"></i></a> <a class="fSort">新品<i class="f-ico-arrow-d"></i></a> <a class="fSort">销量<i class="f-ico-arrow-d"></i></a> <a class="fSort">价格<i class="f-ico-triangle-mt"></i><i class="f-ico-triangle-mb"></i></a> </div> <!-- 商品详情 --> <div class="view grid-nosku"> <div class="product" v-for="result in results"> <div class="product-iWrap"> <!--商品封面--> <div class="productImg-wrap"> <a class="productImg"> <img :src="result.img"> </a> </div> <!--价格--> <p class="productPrice"> <em>{{result.price}}</em> </p> <!--标题--> <p class="productTitle"> <a v-html="result.name"> </a> </p> <!-- 店铺名 --> <div class="productShop"> <span>店铺: 狂神说Java </span> </div> <!-- 成交信息 --> <p class="productStatus"> <span>月成交<em>999笔</em></span> <span>评价 <a>3</a></span> </p> </div> </div> </div> </div> </div> </div> </div> <script th:src="@{/js/axios.min.js}"></script> <script th:src="@{/js/vue.min.js}"></script> <script> new Vue({ el: '#app', data:{ keyword: '', results:[] }, methods:{ searchKey(){ let keyword=this.keyword; console.log(keyword); ///search/{keyword}/{pageNum}/{pageSize} axios.get('search/'+keyword+"/1/20").then(response=>{ console.log(response); this.results=response.data; }); } } }) </script> </body> </html>
-