scrapy-mysql
items.py
用于接收content和author
import scrapy
class ScrapymysqlItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
content = scrapy.Field()
author = scrapy.Field()
pass
scrapymysql.py
用于访问网页内容,提取相关内容交给pipeline
import scrapy
from scrapymysql.items import ScrapymysqlItem
class scrapymysql(scrapy.Spider):
name = 'scrapymysql'
start_urls = ['http://lab.scrapyd.cn/']
def parse(self,response):
item = ScrapymysqlItem()
content = response.css('div.quote')
for i in content:
item['content'] = i.css('.text::text').extract_first()
item['author'] = i.css('.author::text').extract_first()
yield item
pipeline.py
用于处理提交上来的数据
# -*- coding: utf-8 -*-
import pymysql
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
class ScrapymysqlPipeline(object):
def __init__(self):
self.connect = pymysql.connect('localhost','root','wzh960823','test')
self.cursor =self.connect.cursor()
def process_item(self,item,spider):
sql = """INSERT INTO mingyan (content,author) VALUES (%s,%s) """
val = (item['content'],item['author'])
self.cursor.execute(sql,val)
self.connect.commit()
return item
setting.py
打开pipeline
ITEM_PIPELINES = {
'scrapymysql.pipelines.ScrapymysqlPipeline': 300,
}
结果展示
出现的三个错误与解决办法
1.IndentationError: unindent does not match any outer indentation level
错误原因:对齐问题
一般是缩进问题,解决办法是去掉缩进返回上一行,回车自动缩进对齐
2.TabError: inconsistent use of tabs and spaces in indentation
错误原因:tab与空格混淆问题
解决办法:去掉tab与空格,返回上一行,重新自动缩进+补充空格
3.SyntaxError: EOL while scanning string literal
错误原因:检查到非法结束符
当时代码是self.cursor.execute(""“INSERT INTO mingyan (content,author) VALUES (%s,%s) ,($
,
,
,$))
缺少了相对应的”"" “”"