html table 转csv python,convert html table to csv in python

问题

I'm trying to scrape a table from a dynamic page. After the following code (requires selenium), I manage to get the contents of the

I'd like to convert this table into a csv and I have tried 2 things, but both fail:

pandas.read_html returns an error saying I don't have html5lib installed, but I do and in fact I can import it without problems.

soup.find_all('tr') returns an error 'NoneType' object is not callable after I run soup = BeautifulSoup(tablehtml)

Here is my code:

import time

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.keys import Keys

import pandas as pd

main_url = "http://data.stats.gov.cn/english/easyquery.htm?cn=E0101"

driver = webdriver.Firefox()

driver.get(main_url)

time.sleep(7)

driver.find_element_by_partial_link_text("Industry").click()

time.sleep(7)

driver.find_element_by_partial_link_text("Main Economic Indicat").click()

time.sleep(6)

driver.find_element_by_id("mySelect_sj").click()

time.sleep(2)

driver.find_element_by_class_name("dtText").send_keys("last72")

time.sleep(3)

driver.find_element_by_class_name("dtTextBtn").click()

time.sleep(2)

table=driver.find_element_by_id("table_main")

tablehtml= table.get_attribute('innerHTML')

回答1:

Without access to the table you're actually trying to scrape, I used this example:

Header1Header2Header3

Row 11Row 12Row 13Row 21Row 22Row 23Row 31Row 32Row 33

and scraped it using:

from bs4 import BEautifulSoup as BS

content = #contents of that table

soup = BS(content, 'html5lib')

rows = [tr.findAll('td') for tr in soup.findAll('tr')]

This rows object is a list of lists:

[

[

Header1, Header2, Header3],

[

Row 11, Row 12, Row 13],

[

Row 21, Row 22, Row 23],

[

Row 31, Row 32, Row 33]

]

...and you can write it to a file:

for it in rows:

with open('result.csv', 'a') as f:

f.write(", ".join(str(e).replace('

','').replace('','') for e in it) + '\n')

which looks like this:

Header1, Header2, Header3

Row 11, Row 12, Row 13

Row 21, Row 22, Row 23

Row 31, Row 32, Row 33

回答2:

Using the csv module and selenium selectors would probably be more convenient here:

import csv

from selenium import webdriver

driver = webdriver.Firefox()

driver.get("http://example.com/")

table = driver.find_element_by_css_selector("#tableid")

with open('eggs.csv', 'w', newline='') as csvfile:

wr = csv.writer(csvfile)

for row in table.find_elements_by_css_selector('tr'):

wr.writerow([d.text for d in row.find_elements_by_css_selector('td')])

来源:https://stackoverflow.com/questions/33633416/convert-html-table-to-csv-in-python

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值