学习笔记-Python-爬虫2-SSL、js加密、ajax

- SSL
- SSL证书就是指遵守SSL安全套阶层协议的服务器数字证书(SercureSocketLayer)
- 美国网景公司开发
- CA(CertifacateAuthprity)是数字证书认证中心,是发放、管理、废除数字证书的授信人的第三方机构
- 遇到不信任的SSL证书,需要单独处理,案例v17

 

- js加密
- 有的反爬虫策略采用js对需要传输的数据进行加密处理(通常是取md5值)
- 经过加密、传输的就是密文
- 加密函数或者过程一定是在浏览器完成,也就是一定把代码(js代码)暴漏给使用者
- 通过阅读加密算法,就可以模拟出加密过程,从而达到破解
- 参看案例v18
- 破解有道查询单词js加密,案例v19
'''
案例v18
破解有道词典
'''
from urllib import request, parse

def youdao(key):
    url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
    data = {
        "i": key,
        "from": "AUTO",
        "to": "AUTO",
        "smartresult": "dict",
        "client": "fanyideskweb",
        "salt": "1543290787192",
        "sign": "afb3deca4f9d25f946b6fa54ad9bdef4",
        "doctype": "json",
        "version": 2.1,
        "keyfrom": "fanyi.web",
        "action": "FY_BY_REALTIME",
        "typoResult": "false",
    }
    data = parse.urlencode(data).encode()
    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01",
        #"Accept-Encoding": "gzip, deflate",
        "Accept-Language": "zh-CN, zh;q=0.9",
        "Connection": "keep-alive",
        "Content-Length": 203,
        "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
        "Cookie": '_ntes_nnid=e8766845039ffdc1e3db5ea6f4b8d288,1542789184238; OUTFOX_SEARCH_USER_ID_NCOO=1421081855.16469; OUTFOX_SEARCH_USER_ID="-1158124882@10.169.0.82"; JSESSIONID=aaa1FImCZD0bd-wSfttDw; ___rl__test__cookies=1543290787183',
        "Host": "fanyi.youdao.com",
        "Origin": "http://fanyi.youdao.com",
        "Referer": "http://fanyi.youdao.com/",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
    }
    req = request.Request(url=url, headers=headers, data=data)
    rsp = request.urlopen(req)
    html = rsp.read().decode()
    print(html)
if __name__ == '__main__':
    youdao("girl")
'''
案例v19
在案例v18基础上进行改造
破解有道词典
处理js加密代码
通过js查找以下代码
var t = "" + ((new Date).getTime() + parseInt(10 * Math.random(), 10))
salt: t
sign: n.md5("fanyideskweb" + e + t + "sr_3(QOHT)L2dx#uuGR@r")
md5一共需要4个参数,第二个参数e是用户输入的要查找的单词
'''
from urllib import request, parse
def getSalt():
    '''
    把计算salt的js代码转写成python代码
    salt = "" + ((new Date).getTime() + parseInt(10 * Math.random(), 10))
    :return:
    '''
    import time, random
    salt = int(time.time()*1000) + random.randint(0,10)
    return salt
def getMd5(v):
    import hashlib
    md5 = hashlib.md5()
    # update需要一个bytes格式的参数
    md5.update(v.encode('utf-8'))
    md5_hexdigest = md5.hexdigest()
    return md5_hexdigest
def getSign(key, salt):
    sign = "fanyideskweb" + key + str(salt) + "sr_3(QOHT)L2dx#uuGR@r"
    sign = getMd5(sign)
    return sign
def youdao(key):
    url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule"
    salt = getSalt()
    data = {
        "i": key,
        "from": "AUTO",
        "to": "AUTO",
        "smartresult": "dict",
        "client": "fanyideskweb",
        "salt": str(salt),
        "sign": getSign(key, salt),
        "doctype": "json",
        "version": 2.1,
        "keyfrom": "fanyi.web",
        "action": "FY_BY_REALTIME",
        "typoResult": "false",
    }
    print(data)
    data = parse.urlencode(data).encode()
    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01",
        #"Accept-Encoding": "gzip, deflate",
        "Accept-Language": "zh-CN, zh;q=0.9",
        "Connection": "keep-alive",
        "Content-Length": len(data),
        "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
        "Cookie": '_ntes_nnid=e8766845039ffdc1e3db5ea6f4b8d288,1542789184238; OUTFOX_SEARCH_USER_ID_NCOO=1421081855.16469; OUTFOX_SEARCH_USER_ID="-1158124882@10.169.0.82"; JSESSIONID=aaa1FImCZD0bd-wSfttDw; ___rl__test__cookies=1543290787183',
        "Host": "fanyi.youdao.com",
        "Origin": "http://fanyi.youdao.com",
        "Referer": "http://fanyi.youdao.com/",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
    }
    req = request.Request(url=url, headers=headers, data=data)
    rsp = request.urlopen(req)
    html = rsp.read().decode()
    print(html)
if __name__ == '__main__':
    youdao("girl")

- ajax
- 异步请求
- 一定会有url、请求方式、可能有数据
- 一般使用json格式
- 案例v20,豆瓣电影

 

 

转载于:https://www.cnblogs.com/Cloudloong/p/10021785.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值