我正在尝试将web抓取的输出写入CSV文件,以下是我的代码:import bs4
import requests
import csv
#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)
#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")
#creates headers for table
headers = table.find('tr', class_="yfnc_modtitle1")
#creates generator that holds four values that are yearly revenues for company
total_revenue = headers.next_sibling
cost_of_revenue = total_revenue.next_sibling
gross_profit = cost_of_revenue.next_sibling.next_sibling
wang = headers.find_next_siblings("tr")
#iterates through generator from above and writes output to CSV file
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
writer = csv.writer(csvfile,delimiter="|")
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in total_revenue])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in cost_of_revenue])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in gross_profit])
for dude in wang:
writer.writerow([dude.get_text(strip=True).encode("utf-8")])
问题是我在创建每一行并将其写入CSV时重复了很多代码。如您所见,一个keep repeating next_sibling以获取下一行值。我在Beautiful Soup中找到了.find_next_siblings()函数,它几乎完成了我想要的功能,但是函数读取的每一行都被输出到CSV文件的一个单元格中。在
有什么想法吗?如果问题不清楚,请告诉我。在
谢谢。在