selenium(一)

1、

1.1、appium,一个H5端(移动端)自动化爬虫工具。
1.2、css selector,[id=“1”]指属性id的值为1
#content_left > div.result.c-container[id=“1”]
https://www.w3.org/TR/selectors-3/#selectors
1.3、xpath:
//div[@id=“sa-header”]//div[@class=“nav”]/ul//a[contains(text(), “监控查询”)] # 模糊匹配
//div[@id=“sa-header”]//div[@class=“nav”]/ul//a[text()=“监控查询”)]
1.4、

self.browser.switch_to.frame(self.browser.find_element_by_tag_name('iframe'))
self.browser.switch_to.frame('_oid_ifr_') # 根据id

2、一个关于selenium webdriver绕过网站反爬服务的方法
https://blog.csdn.net/javaer_lee/article/details/85160659
(2019-3-25 测试有效)
2.1、 官网安装mitmproxy
2.2、inject_js_proxy.py

from mitmproxy import ctx
injected_javascript = '''
// overwrite the `languages` property to use a custom getter
Object.defineProperty(navigator, "languages", {
  get: function() {
    return ["zh-CN","zh","zh-TW","en-US","en"];
  }
});
// Overwrite the `plugins` property to use a custom getter.
Object.defineProperty(navigator, 'plugins', {
  get: () => [1, 2, 3, 4, 5],
});
// Pass the Webdriver test
Object.defineProperty(navigator, 'webdriver', {
  get: () => false,
});
// Pass the Chrome Test.
// We can mock this in as much depth as we need for the test.
window.navigator.chrome = {
  runtime: {},
  // etc.
};
// Pass the Permissions Test.
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
  parameters.name === 'notifications' ?
    Promise.resolve({ state: Notification.permission }) :
    originalQuery(parameters)
);
'''
 
def response(flow):
    # Only process 200 responses of HTML content.
    if not flow.response.status_code == 200:
        return
 
    # Inject a script tag containing the JavaScript.
    html = flow.response.text
    html = html.replace('<head>', '<head><script>%s</script>' % injected_javascript)
    flow.response.text = str(html)
    ctx.log.info('>>>> js代码插入成功 <<<<')
 
    # 只要url链接以target开头,则将网页内容替换为目前网址
    # target = 'https://target-url.com'
    # if flow.url.startswith(target):
    #     flow.response.text = flow.url

2.3、
bin目录下注入脚本:

C:\Program Files (x86)\mitmproxy\bin>mitmdump -s indject_js_proxy.py   
 
Loading script indject_js_proxy.py
Proxy server listening at http://*:8080

2.4、selenium的webdriver中加入代理:

...
ChromeOptions options = new ChromeOptions();
options.addArguments("--proxy-server=127.0.0.1:8080");
...

再次启动你的webDriber,在控制台下输入navigator.webdriver并回车,如果显示false,说明你已经成功了
测试网站:https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html
如果webDriver是绿色,也说明代理起作用了:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值