以下#内的内容为检验内容,根据需要自己开启
导入需要的包:
import requests
from bs4 import BeautifulSoup
import bs4
import urllib.request
查看网页链接的格式,发现规律:
“http://quotes.money.163.com/trade/lsjysj_002137.html?year=2017&season=1”
“http://quotes.money.163.com/trade/lsjysj_002137.html?year=2017&season=2”
·
·
·
first_url="http://quotes.money.163.com/trade/lsjysj_002137.html?"
构造一个网页链接的列表,接收网页链接信息:
url_list= []
for year in range(2017,2019):
for season in range(1,5):
url =first_url+"year=" +str(year)+"&season="+str(season)
url_list.append(url)
#print(url_list)
开循环用urllib.request.urlopen打开目标网站,将网页的信息以列表的类型存入massage:
ulist=[]
massage=[]
for i in range(0,8):
page=urllib.request.urlopen(url_list[i])
soup = BeautifulSoup(page, "html.parser")
massage.append(soup)
#print(soup)
所需要的信息都在table标签下的tr标签里,用循环一一查找,将内容追加到ulist列表里,之后删除列表中为空的数据,防止报错:
IndexError: list index out of range 这是不删除空数据产生的错误
for j in range(0,8):
table = massage[j].find("table", attrs={"class":"table_bg001 border_box limit_sale"})
for tr in table.findAll('tr'):
tds = tr("td")
ulist.append(tds)
while [] in ulist:
ulist.remove([]) #删除空的数据
将数据保存在自定义的位置:
file=open("*D:\maida.csv* ","w",encoding="UTF-8") 位置根据自己的需求更改
#print("Number of ulist", len(ulist)) 查看ulist里记录的条数,确定range(0,x)里x的值
for i in range(0,468):
data1=ulist[i][0].string
data2=ulist[i][4].string
#print(ulist[i][0],ulist[i][4])
# file.write(data1)
#file.write(data2)
file.write("{:^10}{:^10}\n".format(data1,data2))
print("保存成功!")
file.close()
现在就可以去查看数据了