spiders的使用

最新推荐文章于 2022-04-23 23:26:19 发布

jian1104612147

最新推荐文章于 2022-04-23 23:26:19 发布

阅读量372

点赞数 1

分类专栏： python

本文链接：https://blog.csdn.net/jian1104612147/article/details/80649371

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

first_scrapy文件夹下的items.py:

import scrapy

class FirstScrapyItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
name = scrapy.Field()
url = scrapy.Field()

desc = scrapy.Field()

spiders文件夹下的first_spider.py:

import scrapy
from first_scrapy.items import FirstScrapyItem

class firstSpider(scrapy.Spider):
name = "first"
#bu zhong yao
allowed_domains = ["blog.eastmoney.com"]
start_urls = [
"http://blog.eastmoney.com/xuedaolaozu",
"http://blog.eastmoney.com/sg15837988958sg"

]

#东方财富网站

def parse(self, response):
#filename = response.url.split("/")[-1]
#print 'Curent URL => ', filename
#with open(filename, 'wb') as f:
# f.write(response.body)

for sel in response.xpath('//div[@class="articleTit"]/span[@class="title"]'):
item = FirstScrapyItem()
#item["name"] = sel.xpath('a/text()').extract().encode('utf-8')
item["name"] = sel.xpath('a/text()').extract()
item["url"]= sel.xpath('a/@href').extract()
#wei kong
item["desc"] = sel.xpath('text()').extract()
yield item