目标网址:
https://www.phb123.com/renwu/fuhao/shishi_1.html
首先,创建临时表格:
df=pd.DataFrame()
从这个网页可以发现,前350名就在1到15页里面
就可以写成
for i in range(1,16):
url="https://www.phb123.com/renwu/fuhao/shishi_%s.html"%i
然后
再把每一个网址的表格加进去就可以了:
for html in urls: df=df.append(pd.read_html(html,encoding="utf-8"),ignore_index=True)
使页的表格拼在一起,使用同一个列索引
df=df[[x for x in df]]
最后生成csv表格,over
df.to_csv("福布斯排行榜.csv",header=["世界排名","名字","财富(10亿美元)","财富来源","国家/地区"],index=False)
完整代码:
import pandas as pd
df=pd.DataFrame()
urls=list()
for i in range(1,16):
url="https://www.phb123.com/renwu/fuhao/shishi_%s.html"%i
urls.append(url)
for html in urls:
df=df.append(pd.read_html(html,encoding="utf-8"),ignore_index=True)
df=df[[x for x in df]]
print(df)
df.to_csv("福布斯排行榜.csv",header=["世界排名","名字","财富(10亿美元)","财富来源","国家/地区"],index=False)