我检查了this question,但它只有一个答案,而且有点超出我的理解范围(刚从Python开始)。我用的是python3。在
我试图从this page中获取数据,但是如果你有一个BP帐户,那么这个页面就大不相同了/更有用。我需要我的程序登录我之前,我有美化小组为我获取数据。在
到目前为止from bs4 import BeautifulSoup
import urllib.request
import requests
username = 'myUsername'
password = 'myPassword'
from requests import session
payload = {'action': 'Log in',
'Username: ': username,
'Password: ': password}
# the next 3 lines are pretty much copied from a different StackOverflow
# question. I don't really understand what they're doing, and obviously these
# are where the problem is.
with session() as c:
c.post('https://www.baseballprospectus.com/manageprofile.php', data=payload)
response = c.get('http://www.baseballprospectus.com/sortable/index.php?cid=1820315')
soup = BeautifulSoup(response.content, "lxml")
for row in soup.find_all('tr')[7:]:
cells = row.find_all('td')
name = cells[1].text
print(name)
这个脚本确实有效,它只是在登录之前从站点提取数据,所以它不是我想要的数据。在