HTML
1 读取 HTML 内容
顶级 read_html() 函数可以接受 HTML 字符串、文件或URL,并将 HTML 表解析为 pandas DataFrames 列表。
注意:即使 HTML 内容中仅包含一个表,read_html 也会返回 DataFrame 对象的列表
让我们看几个例子
In [295]: url = (
.....: "https://raw.githubusercontent.com/pandas-dev/pandas/master/"
.....: "pandas/tests/io/data/html/spam.html"
.....: )
.....:
In [296]: dfs = pd.read_html(url)
In [297]: dfs
Out[297]:
[ Nutrient Unit Value per 100.0g oz 1 NLEA serving 56g Unnamed: 4 Unnamed: 5
0 Proximates Proximates Proximates Proximates Proximates Proximates
1 Water g 51.70 28.95 NaN NaN
2 Energy kcal 315 176 NaN NaN
3 Protein g 13.40 7.50 NaN NaN
4 Total lipid (fat) g 26.60 14.90 NaN NaN
.. ... ... ... ... ... ...
32 Fatty acids, total monounsaturated g 13.505 7.563 NaN NaN
33 Fatty acids, total polyunsaturated g 2.019 1.131 NaN NaN
34 Cholesterol mg 71 40 NaN NaN
35 Other Other Other Other Other Other
36 Caffeine mg 0 0 NaN NaN
[37 rows x 6 columns]]
读入 banklist.html 文件的内容,并将其作为字符串传递给 read_html
In [298]: with open(file_path, "r") as f:
.....: dfs = pd.read_html(f.read())
.....:
In [299]: dfs
Out[299]:
[ Bank Name City ... Closing Date Updated Date
0 Banks of Wisconsin d/b/a Bank of Kenosha Kenosha ... May 31, 2013 May 31, 2013
1 Central Arizona Bank Scottsdale ... May 14, 2013 May 20, 2013
2 Sunrise Bank Valdosta ... May 10, 2013 May 21, 2013
3 Pisgah Community Bank Asheville ... May 10, 2013 May 14, 2013
4 Douglas County Bank Douglasville ... April 26, 2013 May 16, 2013
.. ... ... ... ... ...
500 Superior Bank, FSB Hinsdale ... July 27, 2001 June 5