创建csdn博客scrapy
为了避免冲突,独立将生成的csdnSpider文件夹打开
1编写csdn.py
# -*- coding: utf-8 -*-
import scrapy
class CsdnSpider(scrapy.Spider):
name = 'csdn'
allowed_domains = ['csdn.net']
start_urls = ['https://blog.csdn.net/weixin_40543283',
# https: // blog.csdn.net / weixin_40543283
]
def parse(self, response):
pass
2.item.py
import scrapy
class csdnItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title = scrapy.Field()
url = scrapy.Field()
content = scrapy.Field()
3.解析页面csdn.py—>parse
审查元素–>查找规律—>解析页面
发现每篇博客都存放在<div class="article-item-box csdn-tracking-statistics" data-articleid="87871160">
这个标签里
点开分析