分享60个NodeJs爬虫源码总有一个是你想要的

记忆的小河

于 2024-07-20 11:39:17 发布

阅读量671

点赞数 12

文章标签：爬虫

本文链接：https://blog.csdn.net/zdh13370188237/article/details/140568610

版权

分享60个NodeJs爬虫源码总有一个是你想要的

链接：https://pan.baidu.com/s/1FE0eRLU_WHFcpi6Q-SGd6w?pwd=8888
提取码：8888

爱学习的人运气都不差

项目名称：

a vue node score recommender-system，吉他谱爬虫推荐系统

amazon crawler、分布式爬虫系统，已在生产环境中稳定运行两年，支撑每天5千万+爬取量

casperjs+nodejs微博爬虫

instagram的爬虫解决方案

Node js爬虫正方教务系统一键教师教学评价

NodeJs Headless 爬虫

nodejs puppeteer爬虫开发脚手架

nodejs selenium爬虫

Nodejs 小说爬虫

nodeJS 爬虫

nodejs 爬虫爬取免费代理

nodejs 爬虫实战

Nodejs 爬虫（superagent + cheerio）

nodejs 论文爬虫

nodejs+cheerio实现2345天气网爬虫

NodeJS+Express网络爬虫

nodejs+mysql爬虫

nodejs实现的简易图片爬虫

nodejs实现的网络小说爬虫脚本

Nodejs微商相册爬虫-扫码登录

nodejs爬虫

nodejs爬虫 + echarts可视化展示

nodejs爬虫体验

nodejs爬虫框架

nodejs爬虫案例

nodejs爬虫爬取学科网题目信息

nodejs爬虫，各种动画排行、信息等

nodejs爬虫，输入网站自动生成网站sitemap

nodejs相关学习爬虫、服务端,文件操作，mysql

one six three云爬虫，构建有价值的音乐排行榜！

pSITE图片nodejs爬虫

SINABLOG收藏夹图片爬虫, 利用nodejs+async+ts爬取图片

import os
import shutil

def void_folder(path):
    # 访问path路径下的文件或文件夹
    lst = os.listdir(path)
    # 打印每一层的文件或文件夹
    for name in lst:
        # 拼接名称，得到绝对路径，判断该文件是否符合是文件夹
        real_path = os.path.join(path, name)
        # 如果是文件夹，则打空格表示，并且递归访问下一层
        if os.path.isdir(real_path):
            # print(name)
            files = os.listdir(real_path)
            if len(files) == 0:
                print("void_folder()："+name)
                shutil.rmtree(real_path)
                endindex = len(real_path) - len(name)
                real_path = real_path[0:endindex]
                void_folder(real_path)
            else:
                void_folder(real_path)
        # 如果不是文件夹，直接打印，不再递归访问下一层
        else:
            #print(name)
            pass


def void_file(dirPath):
    dirs = os.listdir(dirPath)  # 查找该层文件夹下所有的文件及文件夹，返回列表
    for file in dirs:
        file_full_name = dirPath + '/' + file
        file_ext = os.path.splitext(file_full_name)[-1]
        if file_ext is None  or file_ext=="":
            continue

        if "rar" == str(file_ext.split(".")[1]):
            os.remove(file_full_name)
        if "zip" == str(file_ext.split(".")[1]):
            os.remove(file_full_name)
        if "gz" == str(file_ext.split(".")[1]):
            os.remove(file_full_name)
        if "tgz" == str(file_ext.split(".")[1]):
            os.remove(file_full_name)

# 查找指定文件夹下所有相同名称的文件
def search_file(dirPath, fileName):
    dirs = os.listdir(dirPath)  # 查找该层文件夹下所有的文件及文件夹，返回列表
    for currentFile in dirs:  # 遍历列表
        absPath = dirPath + '/' + currentFile
        if os.path.isdir(absPath):  # 如果是目录则递归，继续查找该目录下的文件
            search_file(absPath, fileName)
        elif currentFile == fileName:
            print(absPath)  # 文件存在，则打印该文件的绝对路径
            os.remove(absPath)

if __name__ == "__main__":
    dirPath = 'D:\Spider\Html\DIV+CSS模板\\98个DIV+CSS模板\DIV+CSS模板'

    search_file(dirPath, "ReadMe.txt")
    search_file(dirPath, "下载网页模板.url")
    search_file(dirPath, "下载网页特效.url")
    search_file(dirPath, "下载字体.url")
    search_file(dirPath, "轻松设计漂亮的网页-mobanwang.com.url")
    search_file(dirPath, "松设计漂亮的网页-mobanwang.com.url")
    void_file(dirPath)

    # search_file(dirPath, "php中文网下载站.url")
    # search_file(dirPath, "php中文网免费下载站.txt")
    #
    # search_file(dirPath, "访问懒人之家.url")
    # search_file(dirPath, "lanrenzhijia.com下载说明.txt")
    #
    #
    # search_file(dirPath, "服务器软件.url")
    # search_file(dirPath, "downcode.com.txt")
    # search_file(dirPath, "中国源码下载站.url")
    #
    # search_file(dirPath, "脚本之家.url")
    # search_file(dirPath, 'jb51.net.txt')
    # search_file(dirPath, '说明.htm')
    # search_file(dirPath, "cnzzz.com.txt")
    # search_file(dirPath, "源码之家说明.txt")
    # search_file(dirPath, "服务器常用软件.html")
    # search_file(dirPath, "服务器常用软件.html")
    # search_file(dirPath, "访问脚本之家.html")
    # search_file(dirPath, "chinaz.com.txt")
    # search_file(dirPath, "访问查看.url")
    # fileName4 = '服务器软件.url'
    # fileName3 = '脚本之家.url'
    # fileName2 = 'Readme-说明.htm'
    # fileName5 = 'jb51.net.txt'
    # search_file(dirPath, fileName2)
    # search_file(dirPath, fileName3)
    # search_file(dirPath, fileName4)
    # search_file(dirPath, fileName5)
    # void_folder(dirPath)
    # void_folder(dirPath)
    # void_folder(dirPath)

torrent种子爬虫

一个nodejs写的爬虫

一个小爬虫，爬取杭电阳光长跑数据

一个抓取汽车上相关信息的nodejs爬虫

一个用 Node js 写的美女写真网站爬虫

一个简易的利用cheerio库制作的爬虫demo

使用Node js编写爬虫获取漫画资源

使用nodejs编写的站视频爬虫

使用node爬虫爬去本人zhihu答案并将答案转为md文件

养一只 nodejs 爬虫去DAYDAY基金网溜达溜达 Egg MVC + ES2015 + React SSR

利用phantomjs、nodejs爬虫

原生nodejs+ES6实现爬虫

基于 nodeJs 的简单爬虫demo

基于Golang的分布式爬虫管理平台，支持Python、NodeJS、Go、Java、PHP等多种编程语言以及多种爬虫框架

基于nodejs的pixiv每日排行榜爬虫

基于NodeJs的一段爬虫程序，用于收集最新前端资讯

基于nodeJS的动漫之家的漫画爬虫

新能源网络爬虫项目

无需 Twitter API 的爬虫工具，使用 node.js 编写，运行在 Chrome、Firefox 无头模式

梨视频Api爬虫接口文档和源码

用nodejs写的一段爬虫demo

用Nodejs编写的图片爬虫

用在oschina下的爬虫程序，用以爬取薪资信息

第一个小爬虫，爬取某度音乐最热歌单中的歌曲

第一次尝试Node js写爬虫

简单的nodejs爬虫

简易 nodejs图片爬虫

该爬虫使用nodejs编写,可以爬DM5,DMZJ的漫画

豆瓣爬虫NodeJS实现，服务端

通过node js创建的一个爬虫小实例，前端通过react渲染

爱学习的人运气都不差

学习知识费力气，

收集整理更不易。

知识付费甚欢喜，

为咱码农谋福利。

感谢您的支持

记忆的小河

关注

12
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
分享60个NodeJs爬虫源码总有一个是你想要的

基于Golang的分布式爬虫管理平台，支持Python、NodeJS、Go、Java、PHP等多种编程语言以及多种爬虫框架。无需 Twitter API 的爬虫工具，使用 node.js 编写，运行在 Chrome、Firefox 无头模式。通过node js创建的一个爬虫小实例，前端通过react渲染。nodejs相关学习爬虫、服务端,文件操作，mysql。该爬虫使用nodejs编写,可以爬DM5,DMZJ的漫画。
复制链接

扫一扫