csv python 只写一次_Python只为CSV-fi写入一行

我很抱歉再次提出这个问题,但是这个问题还没有解决。在

这不是一个非常复杂的问题,我确信这是相当直截了当的,但我根本看不到这个问题。在

我用来解析XML文件的代码是打开的,并以我想要的格式读取——最后一个for循环中的print语句证明了这一点。在

例如,它输出以下内容:Pivoting support handle D0584129 20090106 US

Hinge D0584130 20090106 US

Deadbolt turnpiece D0584131 20090106 US

这正是我希望我的数据写入CSV文件的方式。但是,当我试图将这些作为行写入CSV本身时,它只会打印XML文件中最后一行中的一行,并以这种方式:Flashlight package,D0584138,20090106,US

以下是我的全部代码,因为它可能有助于理解整个过程,其中感兴趣的区域是分隔的\u xml中的for xml字符串的起始位置:from bs4 import BeautifulSoup

import csv

import unicodecsv as csv

infile = "C:\\Users\\Grisha\\Documents\\Inventor\\2009_Data\\Jan\\ipg090106.xml"

# The first line of code defines a function "separated_xml" that will allow us to separate, read, and then finally parse the data of interest with

def separated_xml(infile): # Defining the data reading function for each xml section - This breaks apart the xml from the start (root element <?xml... ) to the next iteration of the root element

file = open(infile, "r") # Used to open the xml file

buffer = [file.readline()] # Used to read each line and placing inside vector

# The first for-loop is used to slice every section of the USPTO XML file to be read and parsed individually

# It is necessary because Python wishes to read only one instance of a root element but this element is found many times in each file which causes reading errors

for line in file: # Running for-loop for the opened file and searches for root elements

if line.startswith("<?xml "):

yield "".join(buffer) # 1) Using "yield" allows to generate one instance per run of a root element and 2) .join takes the list (vector) "buffer" and connects an empty string to it

buffer = [] # Creates a blank list to store the beginning of a new 'set' of data in beginning with the root element

buffer.append(line) # Passes lines into list

yield "".join(buffer) # Outputs

file.close()

# The second nested set of for-loops are used to parse the newly reformatted data into a new list

for xml_string in separated_xml(infile): # Calls the output of the separated and read file to parse the data

soup = BeautifulSoup(xml_string, "lxml") # BeautifulSoup parses the data strings where the XML is converted to Unicode

pub_ref = soup.findAll("publication-reference") # Beginning parsing at every instance of a publication

lst = [] # Creating empty list to append into

with open('./output.csv', 'wb') as f:

writer = csv.writer(f, dialect = 'excel')

for info in pub_ref: # Looping over all instances of publication

# The final loop finds every instance of invention name, patent number, date, and country to print and append into

for inv_name, pat_num, date_num, country in zip(soup.findAll("invention-title"), soup.findAll("doc-number"), soup.findAll("date"), soup.findAll("country")):

print(inv_name.text, pat_num.text, date_num.text, country.text)

lst.append((inv_name.text, pat_num.text, date_num.text, country.text))

writer.writerow([inv_name.text, pat_num.text, date_num.text, country.text])

我也尝试过将open和writer放在for循环之外,以检查哪里出现了问题,但是没有用。我知道这个文件一次只写一行,并且一遍又一遍地重写同一行(这就是为什么CSV文件中只剩下1行),我就是看不到它。在

提前谢谢你的帮助。在

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值