I'm using Python 3 to write a script to log in to Amazon to grab my Kindle highlights. It is based on this article: https://blog.jverkamp.com/2015/07/02/scraping-kindle-highlights/
I am unable to successfully log in and instead get a message saying to enable cookies to continue:
]>
Failed to login:
Please Enable Cookies to Continue
To continue shopping at Amazon.com, please enable cookies in your Web browser.
Learn more about cookies and how to enable them.
I have included requests sessions to handle cookies, but it doesn't seem to be working.
Here is the code I am using to try to do this:
import bs4, requests
session = requests.Session()
session.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}
# Log in to Amazon, we have to get the real login page to bypass CSRF
print('Logging in...')
response = session.get('https://kindle.amazon.com/login')
soup = bs4.BeautifulSoup(response.text, "html.parser")
signin_data = {}
signin_form = soup.find('form', {'name': 'signIn'})
for field in signin_form.find_all('input'):
try:
signin_data[field['name']] = field['value']
except:
pass
signin_data[u'ap_email'] = 'myemail'
signin_data[u'ap_password'] = 'mypassword'
response = session.post('https://www.amazon.com/ap/signin', data = signin_data)
soup = bs4.BeautifulSoup(response.text, "html.parser")
warning = soup.find('div', {'id': 'message_warning'})
if warning:
print('Failed to login: {0}'.format(warning.text))
Is there something I'm missing with my use of sessions?
解决方案
Your signin form data is actually not correct it should be email and password:
signin_data[u'email'] = 'your_email'
signin_data[u'password'] = 'your_password'
You can also avoid the try with a css select and has_attr:
import bs4, requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}
from bs4 import BeautifulSoup
with requests.Session() as s:
s.headers = headers
r = s.get('https://kindle.amazon.com/login')
soup = BeautifulSoup(r.content, "html.parser")
signin_data = {s["name"]: s["value"]
for s in soup.select("form[name=signIn]")[0].select("input[name]")
if s.has_attr("value")}
signin_data[u'email'] = 'your_em'
signin_data[u'password'] = 'pass'
response = s.post('https://www.amazon.com/ap/signin', data=signin_data)
soup = bs4.BeautifulSoup(response.text, "html.parser")
warning = soup.find('div', {'id': 'message_warning'})
if warning:
print('Failed to login: {0}'.format(warning.text))
print(response.content)
The first line of the output, you can see
Amazon Kindle: Home at the end:b'<?xml version="1.0" encoding="utf-8"?>\n\n\n
\n Amazon Kindle: Home\nIf it is not working still, you should update your version of requests and maybe try another user-agent. Once I changed the ap_email and ap_password I logged in fine.