python读取html中的表格数据_Python 数据处理（十八）—— HTML 表格

最新推荐文章于 2023-11-25 14:54:23 发布

王Evey

最新推荐文章于 2023-11-25 14:54:23 发布

阅读量2.3k

点赞数

文章标签： python读取html中的表格数据

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35664081/article/details/114464436

版权

本文介绍了如何使用Python的pandas库通过read_html函数从HTML文件、字符串或URL中读取表格数据。通过多个示例展示了读取不同来源的数据，包括设置匹配条件、选择特定列、转换器等高级用法，以及写入HTML文件的方法。

摘要由CSDN通过智能技术生成

HTML

1 读取 HTML 内容

顶级 read_html() 函数可以接受 HTML 字符串、文件或URL，并将 HTML 表解析为 pandas DataFrames 列表。

注意：即使 HTML 内容中仅包含一个表，read_html 也会返回 DataFrame 对象的列表

让我们看几个例子

In [295]: url = (

.....: "https://raw.githubusercontent.com/pandas-dev/pandas/master/"

.....: "pandas/tests/io/data/html/spam.html"

.....: )

.....:

In [296]: dfs = pd.read_html(url)

In [297]: dfs

Out[297]:

[ Nutrient Unit Value per 100.0g oz 1 NLEA serving 56g Unnamed: 4 Unnamed: 5

0 Proximates Proximates Proximates Proximates Proximates Proximates

1 Water g 51.70 28.95 NaN NaN

2 Energy kcal 315 176 NaN NaN

3 Protein g 13.40 7.50 NaN NaN

4 Total lipid (fat) g 26.60 14.90 NaN NaN

.. ... ... ... ... ... ...

32 Fatty acids, total monounsaturated g 13.505 7.563 NaN NaN

33 Fatty acids, total polyunsaturated g 2.019 1.131 NaN NaN

34 Cholesterol mg 71 40 NaN NaN

35 Other Other Other Other Other Other

36 Caffeine mg 0 0 NaN NaN

[37 rows x 6 columns]]

读入 banklist.html 文件的内容，并将其作为字符串传递给 read_html

In [298]: with open(file_path, "r") as f:

.....: dfs = pd.read_html(f.read())

.....:

In [299]: dfs

Out[299]:

[ Bank Name City ... Closing Date Updated Date

0 Banks of Wisconsin d/b/a Bank of Kenosha Kenosha ... May 31, 2013 May 31, 2013

1 Central Arizona Bank Scottsdale ... May 14, 2013 May 20, 2013

2 Sunrise Bank Valdosta ... May 10, 2013 May 21, 2013

3 Pisgah Community Bank Asheville ... May 10, 2013 May 14, 2013

4 Douglas County Bank Douglasville ... April 26, 2013 May 16, 2013

.. ... ... ... ... ...

500 Superior Bank, FSB Hinsdale ... July 27, 2001 June 5

最低0.47元/天解锁文章

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。