python读取html文件中的表格数据_使用解析html表pd.read_html文件其中单元格本身包含完整表...

最新推荐文章于 2023-01-04 20:29:03 发布

温暖如故

最新推荐文章于 2023-01-04 20:29:03 发布

阅读量598

点赞数

文章标签： python读取html文件中的表格数据

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42117340/article/details/114464460

版权

不能使用^{}读取嵌套表，但可以滚动自己的html阅读器，并对表单元格使用read_html：import pandas as pd

import bs4

with open('up_pf00344.test.html') as f:

html = f.read()

soup = bs4.BeautifulSoup(html, 'lxml')

results = soup.find(attrs = {'id': 'results'})

# get first visible header row as dataframe headers

for row in results.thead.find_all('tr'):

if 'display:none' not in row.get('style',''):

df = pd.DataFrame(columns=[col.get_text() for col in row.find_all('th')])

break

# append all table rows to dataframe

for row in results.tbody.find_all('tr', recursive=False):

if 'display:none' in row.get('style',''):

continue

df_row = []

for col in row.find_all('td', recursive=False):

table = col.find_all('table')

df_row.append(pd.read_html(str(col))[0] if table else col.get_text())

df.loc[len(df)] = df_row

df.iloc[0].map(type)的结果：

^{pr2}$

好处：由于表行有一个id，因此可以将其用作数据帧df.loc[row.get('id')] = df_row的索引，而不是df.loc[len(df)] = df_row。在

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python读取html文件中的表格数据_使用解析html表pd.read_html文件其中单元格本身包含完整表...

不能使用^{}读取嵌套表，但可以滚动自己的html阅读器，并对表单元格使用read_html：import pandas as pdimport bs4with open('up_pf00344.test.html') as f:html = f.read()soup = bs4.BeautifulSoup(html, 'lxml')results = soup.find(attrs = {'id...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。