Pandas的read_html方法能够读取带有table标签的网页中的表格
示例:
import pandas as pd
data = pd.DataFrame()
# 带有table标签的URL
url_list = ['http://www.espn.com/nba/salaries/_/seasontype/4']
for url in url_list:
data = data.append(pd.read_html(url), ignore_index=True)
# startswith方法用于检查字符串是否是以指定子字符串开头
data = data[[x.startswith('$') for x in data[3]]]
# 保存数据
data.to_csv('NAB_salaries.csv', header=['RK', 'NAME', 'TEAM', 'SALARY'], index=False)