Nodejs简易小爬虫

最新推荐文章于 2022-07-07 11:55:43 发布

疾风亦有归途ぃ

最新推荐文章于 2022-07-07 11:55:43 发布

阅读量175

点赞数 2

文章标签： node.js javascript html5

本文链接：https://blog.csdn.net/uijj556/article/details/118324002

版权

使用node访问jd的笔记本电脑列表界面
https://list.jd.com/list.html?cat=670%2C671%2C672&go=0
并爬取数据

流程

1. 使用 node 去访问 https://list.jd.com/list.html?cat=670%2C671%2C672&go=0
  => 使用 superagent 第三方包去访问 jd 这个页面
  => 下载: npm i superagent
  => 导入: require()
  => 使用
2. 拿到它的 html 结构
  => 拿到 superagent 的 data.text 就是页面信息
3. 把某些结构里面的内容拿出来组装成一个对象
  => 解析 html 信息, 和组装成一个对象存
  => 使用 cheerio 第三方包
  => 下载: npm i cheerio
  => 导入
  => 使用
4. 存储到数据库里面
  => 使用 mysql 这个包
  => 下载
  => 导入
  => 使用

具体js代码如下

const superagent = require('superagent'
const cheerio = require('cheerio')
const mysql = require('mysql')

const db = mysql.createPool({
  host: '127.0.0.1',
  port: 3306（这里是数据库端口号）,
  user: '用户名',
  password: '密码',
 database: '数据库名'
})
const list = []

superagent.get('https://list.jd.com/list.html?cat=670%2C671%2C672&page=3&s=55&click=0', (err, data) => {
if (err) return console.log('爬取页面失败')
 parsePage(data.text)
})

function parsePage(page) {
 const $ = cheerio.load(page)

 $('ul.gl-warp > li').each(function (index, item) {
const obj = {
  goods_img: $(item).find('.p-img a img').prop('src'),
  goods_price: $(item).find('.p-price i').text(),
  goods_title: $(item).find('.p-name em').text(),
  goods_name: $(item).find('.p-name i').text()
}


list.push(obj)
console.log(list)
const sql = 'INSERT INTO `jd_list` VALUES(null, ?, ?, ?, ?)'
const info = [ obj.goods_img, obj.goods_price, obj.goods_title, obj.goods_name ]
db.query(sql, info, (err, data) => {
  if (err) return console.log(err)
})

})
 console.log(list)
}