axios 使用及其相应简单爬虫的扩展

最新推荐文章于 2021-12-29 20:43:56 发布

williams_zhong

最新推荐文章于 2021-12-29 20:43:56 发布

阅读量2.4k

点赞数 1

分类专栏：客户端文章标签： axios nodejs 爬虫

本文链接：https://blog.csdn.net/qq_26026975/article/details/78981908

版权

客户端专栏收录该内容

11 篇文章 0 订阅

订阅专栏

axios 是用于访问http/https请求的一个js库。可用于前端和node后端向服务器发送请求的一个库。

npm地址如下: https://www.npmjs.com/package/axios

简单的promise风格代码包装如下:


const axios = require('axios');

const fs = require('fs');

/**
 * @version 1.0 用于访问其他网站的http与https模块
 * 
 */
class AxiosMannage {

    /***
     * @version 1.0 get promise请求包装
     * 
     */
    static async get({ url, data }) {

        return await new Promise(
            (resolve, reject) => {

                axios.get(url, { params: data }).then((response) => {

                    resolve(response);

                }).catch((err) => {

                    reject(err);

                });

            }
        );

    }

    /***
     * @version 1.0 post promise请求包装
     * 
     */
    static async post({ url, data }) {

        return await new Promise(
            (resolve, reject) => {

                axios.post(url, data).then((response) => {

                    resolve(response);

                }).catch((err) => {

                    reject(err);

                });

            }
        );
    }


    /***
     * @version 1.0 下载网络资源
     * 
     * @param type get/post 
     * 
     * @param url 文件资源地址
     * 
     * @param path 保存地址
     * 
     * 
     */
    static download({ type = 'get', url, path }) {

        axios({

            method: type,
            url,
            responseType: 'stream'

        }).then((response) => {

            response.data.pipe(fs.createWriteStream(path));

        });


    }

}

module.exports = { AxiosMannage };

上述代码比较简单,如果涉及到对于token认证,config配置等,可以参考官网进行补充。

拓展:爬虫(数据挖掘的运用)

1.如果想爬取其他网站的内容,可以使用axios进行访问并进行爬取.

2.数据挖掘的简单环节如下:爬取主页链接->分析提取目标链接地址->访问目标链接地址，获取数据存储->后续分析持久化等

3.以爬取百度关于axios链接搜索资源地址为例,代码如下:

let root = "http://www.baidu.com/s?word=axios";

const cheerio = require('cheerio');

AxiosMannage.get({ url: root }).then(

    (html) => {

        let content = html.data;//拿取了列表的链接

        //运用cheerio 拿去所有有用的连接
        //注cheerio 是专门用来解析html文本结构的一个工具

        //https://www.npmjs.com/package/cheerio   cheerio使用地址

        //在这里可以去获取目标页地址
        let $ = cheerio.load(content);

        let listsUrl = $("#content_left").find(".result");

        for (let i = 0; i < listsUrl.length; i++) {

            //获取目标页的链接地址
            let links = $(listsUrl[i]).find("a").attr("href");

            console.log("连接地址: " + links);

        }

    }

);