python读取html中的表格数据_Python 数据处理(十八)—— HTML 表格

HTML

1 读取 HTML 内容

顶级 read_html() 函数可以接受 HTML 字符串、文件或URL,并将 HTML 表解析为 pandas DataFrames 列表。

注意:即使 HTML 内容中仅包含一个表,read_html 也会返回 DataFrame 对象的列表

让我们看几个例子

In [295]: url = (

.....: "https://raw.githubusercontent.com/pandas-dev/pandas/master/"

.....: "pandas/tests/io/data/html/spam.html"

.....: )

.....:

In [296]: dfs = pd.read_html(url)

In [297]: dfs

Out[297]:

[ Nutrient Unit Value per 100.0g oz 1 NLEA serving 56g Unnamed: 4 Unnamed: 5

0 Proximates Proximates Proximates Proximates Proximates Proximates

1 Water g 51.70 28.95 NaN NaN

2 Energy kcal 315 176 NaN NaN

3 Protein g 13.40 7.50 NaN NaN

4 Total lipid (fat) g 26.60 14.90 NaN NaN

.. ... ... ... ... ... ...

32 Fatty acids, total monounsaturated g 13.505 7.563 NaN NaN

33 Fatty acids, total polyunsaturated g 2.019 1.131 NaN NaN

34 Cholesterol mg 71 40 NaN NaN

35 Other Other Other Other Other Other

36 Caffeine mg 0 0 NaN NaN

[37 rows x 6 columns]]

读入 banklist.html 文件的内容,并将其作为字符串传递给 read_html

In [298]: with open(file_path, "r") as f:

.....: dfs = pd.read_html(f.read())

.....:

In [299]: dfs

Out[299]:

[ Bank Name City ... Closing Date Updated Date

0 Banks of Wisconsin d/b/a Bank of Kenosha Kenosha ... May 31, 2013 May 31, 2013

1 Central Arizona Bank Scottsdale ... May 14, 2013 May 20, 2013

2 Sunrise Bank Valdosta ... May 10, 2013 May 21, 2013

3 Pisgah Community Bank Asheville ... May 10, 2013 May 14, 2013

4 Douglas County Bank Douglasville ... April 26, 2013 May 16, 2013

.. ... ... ... ... ...

500 Superior Bank, FSB Hinsdale ... July 27, 2001 June 5

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值