BeautifulSoup可以使我们通过网页的标签找到网页中我们想要的特定数据。本案例可以清楚地理顺从html文件变化到我们想要获得的数据。Python程序如下:
from bs4 import BeautifulSoup
import requests
url = 'http://new.cpc.com.tw/division/mb/oil-more4.aspx'
html = requests.get(url).text
bs = BeautifulSoup(html, 'html.parser')
#print(bs)
data = bs.find_all('span' ,{'id':'Showtd'} )
#print(data)
rows = data[0].find_all('tr')
#print(rows)
prices = list()
i = 0
for row in rows:
if i < 16:
print(row)
cols = row.find_all("td")
if len(cols[1].text ) > 0:
item = [cols[0].text, cols[1].text, cols[2].text, cols[3].text]
prices.append(item)
i += 1
i = 0
for p in prices:
if i < 16:
print(p)
i += 1
现在从变量容器的变化过程,认识提取