您可以使用^{}(用于维护web抓取会话)+^{}(用于HTML解析)+regex来提取一个javascript变量的值,该javascript变量包含script标记内的所需数据,以及^{}从js list中生成python列表:from ast import literal_eval
import re
from bs4 import BeautifulSoup
import requests
url = "https://www.entsoe.eu/db-query/consumption/mhlv-a-specific-country-for-a-specific-month"
payload = {
'opt_period': '0',
'opt_Country': '12', # 12 stands for DE here
'opt_Month': '1',
'opt_Year': '2014',
'opt_Response': '1',
'send': 'send',
'opt_period': '0'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36'}
with requests.Session() as session:
session.headers = headers
session.get(url)
response = session.post(url, data=payload)
soup = BeautifulSoup(response.content)
script = soup.find('script', text=re.compile(r'Ext.onReady')).text
data = literal_eval(re.search(r"var myData = (.*?);", script, re.MULTILINE).group(1))
for row in data:
print row
印刷品:
^{pr2}$
硒特定的方法可能不那么“神奇”,但我认为这对你来说已经足够了(对于一个只需要很少的研究工作的问题)。在