python 抛出_在Python中抛出NotImplementedError

当我试图运行我的代码时,我遇到了这个问题,我已经为这个抓取定义了一个实时请求,但仍然不起作用。有人知道如何在python中处理这个问题吗?

在这种情况下,站点地图有多重要?

提前谢谢import logging

import re

from urllib.parse import urljoin, urlparse

from scrapy.contrib.spiders import CrawlSpider, Rule

from scrapy import Request

from scrapy.spiders import SitemapSpider

from scrapy.selector import Selector

from scrapy.linkextractors import LinkExtractor

from scrapy.shell import inspect_response

from sqlalchemy.orm import sessionmaker

from content.spiders.templates.sitemap_template import ModSitemapSpider

from content.models import db_connect, create_db_table, Articles

from content.items import ContentItems

from content.item_functions import (process_item,

process_singular_item,

process_date_item,

process_array_item,

process_plural_texts,

process_external_links,

process_article_text)

HEADER_XPATH = ['//h1[@class="article-title"]//text()']

AUTHOR_XPATH = ['//span[@class="cnnbyline"]//text()',

'//span[@class="byline"]//text()']

PUBDATE_XPATH = ['//span[@class="cnnDateStamp"]//text()']

TAGS_XPATH = ['']

CATEGORY_XPATH = ['']

TEXT = ['//div[@id="storytext"]//text()',

'//div[@id="storycontent"]//p//text()']

INTERLINKS = ['//span[@class="inStoryHeading"]//a/@href']

DATE_FORMAT_STRING = '%Y-%m-%d'

class CNNnewsSpider(ModSitemapSpider):

name = 'cnn'

allowed_domains = ["cnn.com"]

sitemap_urls = ["http://edition.cnn.com/sitemaps/sitemap-news.xml"]

def parse(self, response):

items = []

item = ContentItems()

item['title'] = process_singular_item(self, response, HEADER_XPATH, single=True)

item['resource'] = urlparse(response.url).hostname

item['author'] = process_array_item(self, response, AUTHOR_XPATH, single=False)

item['pubdate'] = process_date_item(self, response, PUBDATE_XPATH, DATE_FORMAT_STRING, single=True)

item['tags'] = process_plural_texts(self, response, TAGS_XPATH, single=False)

item['category'] = process_array_item(self, response, CATEGORY_XPATH, single=False)

item['article_text'] = process_article_text(self, response, TEXT)

item['external_links'] = process_external_links(self, response, INTERLINKS, single=False)

item['link'] = response.url

items.append(item)

return items

这是我的文本结果:

^{pr2}$

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值