python的几种简单爬虫

最新推荐文章于 2024-09-14 11:30:21 发布

Zero cool Ⅱ

最新推荐文章于 2024-09-14 11:30:21 发布

阅读量198

点赞数

文章标签：爬虫 python 开发语言

本文链接：https://blog.csdn.net/m0_55017965/article/details/126256194

版权

爬虫的本质是用户给一个网站爬虫，并设置指定的规则，然后爬虫根据指定的网站去爬取相应的信息，也可以用另一种方式访问网站，指利用代码的方式去模拟浏览器进行前后端的交互。

import requests
url=“https://www.baidu.com/baidu?tn=monline_3_dg&ie=utf-8&wd=ice%20cube”
headers={
‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0’
}
b=requests.get(url,headers=headers)
print(b.text)
b.close
#爬取搜狗引擎的一个特定内容

import requests
star=input(“you favorite star is:”)
url=“https://www.baidu.com/baidu?tn=monline_3_dg&ie=utf-8&wd={star}”
headers={
‘User-Agent’:‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0’
}
b=requests.get(url,headers=headers)
print(b.text)
b.close
#通过用户输入爬取搜索引擎的指定的内容

import requests
url=“https://fanyi.baidu.com/sug”
a=input(“你输入的英文单词为:”)
c={
“kw”:a
}
b=requests.post(url,data=c)
print(b.json())
b.close()
#一个名为在线百度翻译的爬虫

import requests
url=“http://movie.douban.com/j/chart/top_list”
param={
“type”:“24”,
“interval_id”:“100:90”,
“action”:“”,
“start”:0,
“limit”:20,
} #爬取规则，可添加爬取网页连接的数量等规则
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55”
}
b=requests.get(url=url,params=param,headers=headers)
print(b.json())
b.close()
#爬取电影网站的爬虫