由于本机IP对淘宝访问多次后出发了淘宝反爬机制跳出滑动验证码等等,所以建立一个IP池随机IP地址对淘宝模拟登录
某网站免费IP
58.209.53.172:62330
106.110.91.240:20750
114.234.167.236:20693
180.124.87.81:20689
222.187.164.36:20820
113.123.119.218:50045
49.82.252.21:20685
121.224.106.53:12004
119.126.157.59:55201
222.187.165.85:20666
114.234.163.133:20752
114.235.200.49:20753
223.243.170.237:20713
183.165.225.113:20726
这里随机选择了4个,有些现在可能失效了,随便去网上找就行了。
模拟登录
import random
import time
#模拟浏览器
from selenium.webdriver import Chrome
#设置参数模块ChromeOptions
from selenium.webdriver import ChromeOptions
#魔术方法 构造模拟登录
def __init__(self):
url = 'https://login.taobao.com/member/login.jhtml'
self.url = url
#IP池
proxies = [
'183.165.225.113:20726',
'223.243.170.237:20713',
'114.235.200.49:20753',
'114.234.163.133:20752'
]
self.options = ChromeOptions()#参数设置器
#添加协议IP
self.options.add_argument('...proxy_server='+random.choice(proxies))
#防止淘宝检测到selenium登录
self.options.add_experimental_option('excludeSwitches', ['enable-automation'])
self.options.add_experimental_option('useAutomationExtension', False)
self.driver = Chrome(options=self.options) self.driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
def login_Info(self):
self.driver.get(self.url)
self.driver.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys('15682528943')
time.sleep(random.randint(1,3))
self.driver.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys('ljj2766768323')
time.sleep(random.randint(1,3))
self.driver.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
time.sleep(random.randint(1, 6))
taobaoInfo().login_Info()
感悟
虽然使用了IP池但是没有合理运用IP地址,实现IP地址的充分调用,来实现数据的高频爬取。