步骤:
-
获取数据
-
解析数据
-
获取数据
- 导入requests库
- 参数:url,params,header
- 发起请求,requests.get(url,params,headers)
- 查看状态码,r.status_code
- 查看内容 r.text
import requests
url="https://s.askci.com/stock/a/0-0"
para={
"reportTime" : "2023-03-31",
"pageNum" : 1
}
header={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36"
}
r = requests.get(url,params=para,headers=header)
r.status_code
200
r.text
- 解析数据
- 导入BeautifulSoup库
- 将页面封装为文档树
- 解析数据所在的表格
- 解析table标题
- 解析table数据
soup.find()/findAll(),列表推导式
from bs4 import BeautifulSoup
(1)将页面封装为文档树
soup = BeautifulSoup(r.text,"lxml")
soup
(2)解析数据所在的表格
table = soup.find(id="myTable04")
table
(3)解析表格的标题
ths = table.find_all("th")
ths
title = [th.text for th in ths]
title
(4)解析表格的数据
tbody = table.find("tbody")
trs = tbody.find_all("tr")
data = []
for tr in trs:
tds = tr.find_all("td")
tdsv = [td.text for td in tds]
data.append(tdsv)
data
tableData=[]
for page in range(1,252):
html = getHtml(page)
soup = BeautifulSoup(html,"lxml")
if page==1:
title = parseTitle(soup)
tableData.append(title)
pageData = parseData(soup)
tableData.extend(pageData)
tableData[:1]
import csv
def saveCSV(data):
with open("mdata/stockData.csv","w",newline="") as f:
writer = csv.writer(f)
writer.writerows(data)
saveCSV(tableData)