豆瓣top电影数据爬取至mongoDB数据库

风清俊

于 2020-10-19 22:06:32 发布

阅读量669

点赞数

分类专栏： # 爬虫文章标签：爬虫 mongodb

本文链接：https://blog.csdn.net/weixin_43447957/article/details/109170326

版权

通过scrapy框架将豆瓣top250电影信息数据进行爬取至数据库

1.settings.py:爬虫配置信息

# -*- coding: utf-8 -*-

# Scrapy settings for crawlerprc01 project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'crawlerprc01'

SPIDER_MODULES = ['crawlerprc01.spiders']
NEWSPIDER_MODULE = 'crawlerprc01.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'crawlerprc01 (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 1
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
   
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
   
#    'crawlerprc01.middlewares.Crawlerprc01SpiderMiddleware': 543,
#}

最低0.47元/天解锁文章

风清俊

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
豆瓣top电影数据爬取至mongoDB数据库

通过scrapy框架将豆瓣top250电影信息数据进行爬取至数据库1.settings.py:爬虫配置信息# -*- coding: utf-8 -*-# Scrapy settings for crawlerprc01 project## For simplicity, this file contains only settings considered important or# commonly used. You can find more settings consulting t
复制链接

扫一扫