项目介绍
该项目实战源自B站狂神说Java
视频Elasticsearch教程,包括爬虫、数据存储与搜索和页面展示三部分。
技术:jsoup、springboot、elasticsearch和vue
最终效果:
1. 爬虫
爬虫部分使用jsoup对京东商城进行页面解析,爬取商品信息、价格和图片
爬取图片时,注意图片的懒加载,img标签的src为默认标签,真实加载图片在source-data-lazy-img下。
public class jsoupUtils {
public static List<goods> getTargetGoods(String keywords) throws IOException {
String url="https://search.jd.com/Search?keyword="+keywords;
Document document = Jsoup.parse(new URL(url), 3000);
Element list = document.getElementById("J_goodsList");
//System.out.println(list.html());
Elements li = list.getElementsByTag("li");
List<goods> goodsArrayList = new ArrayList<>();
for (Element element : li) {
String img = element.getElementsByTag("img").eq(0).attr("source-data-lazy-img");
String name = element.getElementsByClass("p-name").eq(0).text();
String price = element.getElementsByClass("p-price").eq(0).text();
goods goods = new goods();
goods.setImg(img);
goods.setName(name);
goods.setPrice(price);
goodsArrayList.add(goods);
}
return goodsArrayList;
}
}
2. 数据存储
将用爬虫爬取的商品信息放入ES中存储
public Boolean BulkGoods(String keywords) throws IOException {
List<goods> goodsList = jsoupUtils.getTargetGoods(keywords);
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("10s");
for (int i = 0; i < goodsList.size(); i++) {
bulkRequest.add(new IndexRequest("jd_goods").source(JSON.toJSONString(goodsList.get(i)), XContentType.JSON));
}
BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
return !bulk.hasFailures();
}
查询并对查找字段高亮
public List<Map<String, Object>> SearchGoods(String keywords,int pageNum,int pageSize) throws Exception{
if (pageNum<=0){
pageNum=1;
}
SearchRequest jd_goods = new SearchRequest("jd_goods");
//构建搜索条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("name");
highlightBuilder.requireFieldMatch(false);
highlightBuilder.preTags("<span style='color:red'>");
highlightBuilder.postTags("</span>");
searchSourceBuilder.highlighter(highlightBuilder);
searchSourceBuilder.from(pageNum);
searchSourceBuilder.size(pageSize);
TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name", keywords);
searchSourceBuilder.query(queryBuilder);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
jd_goods.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(jd_goods, RequestOptions.DEFAULT);
ArrayList<Map<String, Object>> goodlist = new ArrayList<>();
for (SearchHit hit : searchResponse.getHits().getHits()) {
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField name = highlightFields.get("name");
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
if (name!=null){
Text[] fragments = name.fragments();
String title="";
for (Text fragment : fragments) {
title+=fragment;
}
sourceAsMap.put("name",title);
}
goodlist.add(sourceAsMap);
}
return goodlist;
}
3.页面显示
Controller层接口接受REST请求
@GetMapping("/parse/{keyword}")
public Boolean BulkIntoEs(@PathVariable("keyword") String keyword) throws IOException {
return bulkService.BulkGoods(keyword);
}
@GetMapping("/search/{keyword}/{pageNum}/{pageSize}")
public List<Map<String, Object>> SearchGoods(
@PathVariable("keyword") String keyword,
@PathVariable("pageNum") int pageNum,
@PathVariable("pageSize") int pageSize
) throws Exception {
return bulkService.SearchGoods(keyword,pageNum,pageSize);
}
前端略
项目代码:
https://github.com/Icedzzz/ElasticsearchDemoAndProject
最终结果: