JAVA爬取淘宝、京东、天猫以及苏宁商品历史价格（二）

最新推荐文章于 2023-10-31 15:48:07 发布

菜C++鸡java

最新推荐文章于 2023-10-31 15:48:07 发布

阅读量2.5k

点赞数 9

文章标签： java 经验分享爬虫后端

本文链接：https://blog.csdn.net/qq_40921561/article/details/110950930

版权

JAVA爬取淘宝、京东、天猫以及苏宁商品历史价格

写在最前
爬取问题汇总
爬取问题解决

写在最前

写的仓促，有很多地方可能没有解释或者描述到位，大家有什么疑问欢迎评论或者私聊。爬取商品历史价格用的网站为http://www.tool168.cn/history/。想看代码可以直接看最后，除了html请求接口（也好写），其他直接可以用。

爬取问题汇总

1.部分商品未被收录（新出产品或者下单为0的商品），但是绝大多数商品都是收录的，完全可以满足我们的需求了。
2.反爬策略已经算是中上级了，需要花点心思研究。例如：请求头、token校验以及IP请求峰值限制，每天IP最大访问量。
3.页面会有部分改变（算是反爬策略的一种）。

爬取问题解决

爬取过程如下

在这里插入图片描述

请求地址拼接

http://www.tool168.cn/history/请求地址格式如下，这里需要注意的地方是，请求这个地址，并不能获取商品的具体信息，只是获取到checkCode的。

http://www.tool168.cn/?m=history&a=view&k=url

大家只需要根据不同的电商的商品格式替换到url即可。
京东：https://item.jd.com/100009082466.html
淘宝：https://detail.tmall.com/item.htm?id=599621867171
苏宁：https://product.suning.com/0000000000/11343357124.html
…
通过get请求，解析html页面，然后获取到checkCodeId。第一步工作就完成了。
在这里插入图片描述

数据请求头校验

第二步，发送post请求，请求http://www.tool168.cn/dm/ptinfo.php这个post请求，这个请求会根据上一步获取的checkCode进行校验，然后返回相应的数据（还有一个con参数不要忘记，值就是第一步的商品链接）。
在这里插入图片描述

数据爬取

这个时候打开返回体，会看到code这个字段，这个才是至关重要的校验码，然后根据这个code校验码发送http://www.tool168.cn/dm/history.php?code=「code」&t= post请求，这里面code参数就是code校验码，t可以不传。
就这里！就这里！！就这里！！！在这里加代理，网站是在这里进行瞬时请求峰值限制的，所以为了提高爬取上线，加个吧。
在这里插入图片描述

数据解析

终于过五关斩六将到了最后一步了，在上一步的请求中，查看response，这里面就是你想要的数据了。（截图中为preveiw，方便大家看的）。到此，就获得了该商品的数据。
在这里插入图片描述

代码

 /**
     * @Description: parsePrice   获得某个商品的价格信息
     * @param itemDetailsVO
     * @param pid
     * @return 
     * @throws IOException 
     * @throws ParseException 
     */
    @SuppressWarnings("unchecked")
    private static void parsePrice(ItemDetailsVO itemDetailsVO, String itemUrl) throws ParseException, IOException{
        CloseableHttpClient httpclient = HttpClients.createDefault();
        //注意  非当前页面
        String html = HttpClientUtils.getHtml("http://www.tool168.cn/?m=history&a=view&k="+itemUrl,new HashMap<String, Object>());
        String checkCode = Jsoup.parse(html).select("#checkCodeId").attr("value");
        HttpPost httpPost = new HttpPost("http://www.tool168.cn/dm/ptinfo.php");
        //设置代理
//        RequestConfig requestConfig = RequestConfig.copy(defaultRequestConfig)
//                .setProxy(new HttpHost("myotherproxy", 8080))
//                .build();
//        httpPost.setConfig(config);
        List <NameValuePair> nvps = new ArrayList <NameValuePair>();
        nvps.add(new BasicNameValuePair("checkCode", checkCode));
        nvps.add(new BasicNameValuePair("con", itemUrl));
        try
        {
            httpPost.setEntity(new UrlEncodedFormEntity(nvps));
        } catch (UnsupportedEncodingException e)
        {
            throw ExceptionFactory.encodingException();
        }
        CloseableHttpResponse codeResponse = httpclient.execute(httpPost);
        try {
            HttpEntity codeEntity = codeResponse.getEntity();
            String str = EntityUtils.toString(codeEntity,"UTF-8");
            Gson gson = new Gson();
            Map<String, Object> hashMap = gson.fromJson(str, Map.class);
            List <NameValuePair> priceNvps = new ArrayList <NameValuePair>();
            priceNvps.add(new BasicNameValuePair("code", hashMap.get("code").toString()));
            priceNvps.add(new BasicNameValuePair("t", ""));
            //历史价格请求  post
            HttpPost priceHttpPost = new HttpPost("http://www.tool168.cn/dm/history.php?code=" + hashMap.get("code").toString() + "&t=");
            priceHttpPost.setEntity(new UrlEncodedFormEntity(priceNvps));
            CloseableHttpResponse priceResponse = httpclient.execute(priceHttpPost);
            try
            {
                HttpEntity priceEntity = priceResponse.getEntity();
                String str2 = EntityUtils.toString(priceEntity,"UTF-8");
                if (str2.contains("chart(\""))
                {
                    //处理返回的response
                    str2 = str2.substring(str2.indexOf("chart(\"") + 7, str2.lastIndexOf("]") + 1);
                    String[] reStrings = str2.split("]");
                    for (int i = 0; i < reStrings.length; i++)
                    {
                        reStrings[i] = reStrings[i].substring(reStrings[i].indexOf("(")+1, reStrings[i].indexOf("\"")-1).replaceAll("\\),", "-");
                    }
                    setMaxMin(itemDetailsVO, reStrings);
                }else if (str2.contains("chart(\'"))
                {
                    //处理返回的response
                    str2 = str2.substring(str2.indexOf("chart(\'") + 7, str2.lastIndexOf("]") + 1);
                    String[] reStrings = str2.split("]");
                    for (int i = 0; i < reStrings.length; i++)
                    {
                        reStrings[i] = reStrings[i].substring(reStrings[i].indexOf("(")+1, reStrings[i].indexOf("\"")-1).replaceAll("\\),", "-");
                    }
                    setMaxMin(itemDetailsVO, reStrings);
                }else {
                    //无数据或者解析异常或者其他异常报错
                    itemDetailsVO.setTrend(new LinkedList<Map<String,Object>>());
                    itemDetailsVO.setPrice("-1");
                    itemDetailsVO.setMaxPrice("-1");
                    itemDetailsVO.setMinPrice("-1");
                }
                EntityUtils.consume(priceEntity);
            } finally
            {
                priceResponse.close();
            }
            EntityUtils.consume(codeEntity);
        } finally {
            codeResponse.close();
       }
    }
    /**
     * @Description: setMaxMin 设置最高价格  最低价格以及当前价格
     * @param itemDetailsVO
     * @param reStrings
     */
    private static void setMaxMin(ItemDetailsVO itemDetailsVO, String[] reStrings) {
        List<Map<String, Object>> list = new LinkedList<Map<String,Object>>();
        double maxPrice = 0;
        double minPrice = 100000000000d;
        for (String temp : reStrings)
        {
            Map<String, Object> map = new HashMap<String, Object>();
            String[] tempStrings = temp.split("-");
            double price = Double.parseDouble(tempStrings[1]);
            if (maxPrice < price)
            {
                maxPrice = price;
            }
            if (minPrice > price)
            {
                minPrice = price;
            }
            map.put("date", DateUtil.formatDateYYYYMMDD(tempStrings[0]));
            map.put("price", price);
            list.add(map);
        }
//        Gson gson = new Gson();
//        JsonArray trendArray= new JsonParser().parse(gson.toJson(list)).getAsJsonArray();
        itemDetailsVO.setTrend(list);
        itemDetailsVO.setPrice(list.get(list.size()-1).get("price").toString());
        itemDetailsVO.setMaxPrice(String.valueOf(maxPrice));
        itemDetailsVO.setMinPrice(String.valueOf(minPrice));
    }