简易版爬虫node.js

最新推荐文章于 2023-07-28 17:39:21 发布

什什么都绘

最新推荐文章于 2023-07-28 17:39:21 发布

阅读量136

点赞数

分类专栏： Node 文章标签： nodejs

本文链接：https://blog.csdn.net/qq_39406353/article/details/108417142

版权

Node 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

今个闲着无聊，试着去爬一下某东的页面信息，哈哈哈哈！！！

准备工作

首先通过查阅node.js官方文档，你可以知道两个包必须要用的：
      superagent: 用来在服务端发送 http 或者 https 请求的
      cheerio: 	  把页面解析, 返回一个 jQuery 选择器一样的函数$

下载需要的两个包

 npm i superagent cheerio

导入使用

const superagent = require('superagent')
const cheerio = require('cheerio')

创建数组存放数据

const goodsList = []

使用

 1. 使用 superagent 去访问你要爬取的页面
    => end() 方法就是访问地址结束的回调函数
 2. 使用 cheerio 解析一下
    => 使用 cheerio.load(你要解析的内容) 方法
    => 返回值: 就是一个向 $ 函数一样的东西
 3. 按照你的需求拆解内容
    => 提前准备好一个数组
    => 向数组里面添加

superagent.get('https://search.jd.com/Search?keyword=%E7%89%9B%E4%BB%94%E8%A3%A4%E7%94%B7&enc=utf-8&pvid=olt4tati.7ri5o7').end((err, data) => {
        if (err) return console.log('爬取失败')
        // data.text 就是整个页面文件
        parseData(data.text)
    })
function parseData(page) {
    // 使用cheerio 解析
    const $ = cheerio.load(page)
    $('.gl-warp > .gl-item').each((index, item) => {
        const obj = {
            goods_img: $(item).find('img').prop('src'),
            goods_price: $(item).find('.p-price i').text(),
            goods_title: $(item).find('.p-name i').text(),
            goods_name: $(item).find('.p-name em').text(),
            goods_commit: $(item).find('.p-commit strong a').text(),
            goods_shop: $(item).find('.p-shop .J_im_icon a').text(),
            goods_icon: $(item).find('.p-icons i').text()
        }
        goodsList.push(obj)
        console.log(goodsList)
    })
}

截取的数据如下

在这里插入图片描述

什什么都绘

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
简易版爬虫node.js

今个闲着无聊，试着去爬一下某东的页面信息，哈哈哈哈！！！准备工作首先通过查阅node.js官方文档，你可以知道两个包必须要用的： superagent: 用来在服务端发送 http 或者 https 请求的 cheerio: 把页面解析, 返回一个 jQuery 选择器一样的函数$下载需要的两个包 npm i superagent cheerio导入使用const superagent = require('superagent')const cheerio
复制链接

扫一扫