python在输出中间加空行_crapy python csv输出之间有空行

最新推荐文章于 2023-06-29 20:19:19 发布

weixin_39615991

最新推荐文章于 2023-06-29 20:19:19 发布

阅读量489

点赞数

文章标签： python在输出中间加空行

我在结果csv输出文件中的每一行垃圾输出之间得到不需要的空白行。在

我已经从python2迁移到python3，并且使用windows10。因此，我正在为python3调整我的小项目。在

我目前(目前也是唯一的)问题是，当我将无用的输出写入CSV文件时，每行之间都会有一个空白行。这已经在这里的几篇文章中强调过了(这是与Windows有关的)，但我无法找到一个解决方案。在

碰巧，我还添加了一些代码到管道.py文件，以确保csv输出按给定的列顺序而不是随机顺序。因此，我可以使用普通的scrapy crawl charleschurch而不是scrapy crawl charleschurch -o charleschurch2017xxxx.csv来运行这段代码

有人知道如何跳过/省略CSV输出中的空白行吗？在

我的管道.py下面是代码(我可能不需要import csv行，但我想我可能需要这样做才能得到最终答案)：# -*- coding: utf-8 -*-

# Define your item pipelines here

# Don't forget to add your pipeline to the ITEM_PIPELINES setting

# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import csv

from scrapy import signals

from scrapy.exporters import CsvItemExporter

class CSVPipeline(object):

def __init__(self):

self.files = {}

@classmethod

def from_crawler(cls, crawler):

pipeline = cls()

crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)

crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)

return pipeline

def spider_opened(self, spider):

file = open('%s_items.csv' % spider.name, 'w+b')

self.files[spider] = file

self.exporter = CsvItemExporter(file)

self.exporter.fields_to_export = ["plotid","plotprice","plotname","name","address"]

self.exporter.start_exporting()

def spider_closed(self, spider):

self.exporter.finish_exporting()

file = self.files.pop(spider)

file.close()

def process_item(self, item, spider):

self.exporter.export_item(item)

return item

我把这一行添加到设置.py文件(不确定300的相关性)：

^{pr2}$

我的代码如下：import scrapy

from urllib.parse import urljoin

from CharlesChurch.items import CharleschurchItem

class charleschurchSpider(scrapy.Spider):

name = "charleschurch"

allowed_domains = ["charleschurch.com"]

start_urls = ["https://www.charleschurch.com/county-durham_willington/the-ridings-1111"]

def parse(self, response):

for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):

item = CharleschurchItem()

item['name'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/span[1]/b/text()').extract()

item['address'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/div/*[@itemprop="postalCode"]/text()').extract()

plotnames = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/text()').extract()

plotnames = [plotname.strip() for plotname in plotnames]

plotids = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/@href').extract()

plotids = [plotid.strip() for plotid in plotids]

plotprices = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__price"]/text()').extract()

plotprices = [plotprice.strip() for plotprice in plotprices]

result = zip(plotnames, plotids, plotprices)

for plotname, plotid, plotprice in result:

item['plotname'] = plotname

item['plotid'] = plotid

item['plotprice'] = plotprice

yield item