Python爬虫实现字符联想

五一节放假正想放肆一把时,发现

 

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA5p6c6Iyo,size_20,color_FFFFFF,t_70,g_se,x_16

 为什么浏览器能预知我的输入呢?

可否用他实现一些有趣的事呢?

于是一个F12网页现原形

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA5p6c6Iyo,size_20,color_FFFFFF,t_70,g_se,x_16

 太多东西了,清除之后在操作一次

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA5p6c6Iyo,size_20,color_FFFFFF,t_70,g_se,x_16

 嗯,这里面的东西有点东西

而且又是Get请求,好办

import urllib.request
import urllib.parse

_360_url = 'https://smart.sug.so.com/suggest?crec=0&pid=webpage&word=%E5%B9%82&srcg=&src=hao_360so_suggest&encodein=utf-8&encodeout=utf-8&count=10&callback=__jsonp28__&t=1651398804750'#'常规'中的'请求URL'

headers={
'user-agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36 QIHU 360SE/13.1.5330.0"
}#'请求标头'中的'user-agent'

re=urllib.request.Request(url=_360_url,headers=headers)
res=urllib.request.urlopen(re)
T= res.read().decode('utf-8')

T就是预览(响应)中的数据了,只是没有换行之类的

>>> print(T)

 __jsonp28__({"abv":"b","errno":0,"data":{"query":"幂","errorcode":0,"ext":"nlpv=test_yc_18","version":"revise","result":[{"ci":"1.000000","ctrScore":"0.070790","recallType":"nginx'wangdun'ci'tail'xgb","recallScore":"0.000000","word":"幂的运算","rerankScore":"0.266065","rankScore":"0.266065"},{"ci":"1.000000","ctrScore":"0.064769","recallType":"nginx'wangdun'ci'tail'xgb","recallScore":"0.000000","word":"幂函数","rerankScore":"0.254498","rankScore":"0.254498"},{"ci":"1.000000","ctrScore":"0.014691","recallType":"nginx'wangdun'xgb","recallScore":"0.000000","word":"幂是什么","rerankScore":"0.121207","rankScore":"0.121207"},{"ci":"1.000000","ctrScore":"0.007204","recallType":"nginx'xgb","recallScore":"0.000000","word":"幂怎么读","rerankScore":"0.084879","rankScore":"0.084879"},{"ci":"1.000000","ctrScore":"0.004697","recallType":"nginx'wangdun'tail'xgb","recallScore":"0.000000","word":"幂的乘方与积的乘方","rerankScore":"0.068532","rankScore":"0.068532"},{"ci":"1.000000","ctrScore":"0.004360","recallType":"nginx'wangdun'ci'tail'xgb","recallScore":"0.000000","word":"幂函数图像","rerankScore":"0.066031","rankScore":"0.066031"},{"ci":"1.000000","ctrScore":"0.004268","recallType":"nginx'wangdun'xgb","recallScore":"0.000000","word":"幂函数公式","rerankScore":"0.065332","rankScore":"0.065332"},{"ci":"1.000000","ctrScore":"0.003952","recallType":"nginx'ci'tail'xgb","recallScore":"0.000000","word":"幂学在线","rerankScore":"0.062861","rankScore":"0.062861"},{"ci":"1.000000","ctrScore":"0.002698","recallType":"nginx'wangdun'xgb","recallScore":"0.000000","word":"幂次方","rerankScore":"0.051942","rankScore":"0.051942"},{"ci":"1.000000","ctrScore":"0.001830","recallType":"nginx'wangdun'tail'xgb","recallScore":"0.000000","word":"幂律分布","rerankScore":"0.042779","rankScore":"0.042779"}],"ssid":"cebbce4bdd6848cf8e25f8bdcf3ce3a3","src":"hao_360so_suggest"}});

对于这么长的数据,我是这样捣鼓的:

T = T[13:-3]
T = eval(T)
result = T['data']['result']
for i in result:
	print(i['word'])

有点奇葩吧,这段大意是:

1.现将它里面长得像字典的东西截出来。

2.用eval()函数把它变成真的字典。

3.分析字典T,将其中有用的一个列表截下来。

4.分析这个由若干个字典组成的列表,用for循环将有用的数据打印下来。

 OK,

但没完全OK,

如何让它具有海纳百川的能力,而不只会一个‘幂’?

很简单,看看请求URL长什么样子。

https://smart.sug.so.com/suggest?crec=0&pid=webpage&word=%E5%B9%82&srcg=&src=hao_360so_suggest&encodein=utf-8&encodeout=utf-8&count=10&callback=__jsonp28__&t=1651411357360

再看看'查询字符串参数'是什么样子的

 

 

 很显然,url是由红色部分加上蓝色部分蓝色部分也就是用‘&’将'查询字符串参数'中的参数连接起来形成的。但‘幂’这个中文变成了‘%E5%B9%82’,这是由于中文在请求中会被URL加密。

这样加密即可:

from urllib.parse import quote
s = quote('幂', 'utf-8')
>>>print(s)
%E5%B9%82

于是与上面的代码有机结合,完整代码:

from urllib.parse import quote
import urllib.request
import urllib.parse
import json
import time

a = input()
s = quote(a, 'utf-8')

_360_url = 'https://smart.sug.so.com/suggest?crec=0&pid=webpage&word={}&srcg=&src=hao_360so_suggest&encodein=utf-8&encodeout=utf-8&count=10&callback=__jsonp28__&t=1651398804750'.format(s)
headers={
'user-agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36 QIHU 360SE/13.1.5330.0"
}
re=urllib.request.Request(url=_360_url,headers=headers)
res=urllib.request.urlopen(re)
T = res.read().decode('utf-8')
T = T[13:-3]
T = eval(T)
result = T['data']['result']
for i in result:
	print(i['word'])

吾不多求,喜欢就请君点个赞。

  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值