爬虫
网络爬虫(又被成为网络蜘蛛,网络机器人,在FOAF社区中间,更经常被称为网络追逐者),是按照一定的负责,自动地抓取万维网信息的程序或者脚本,另外一些不常用的名字还有蚂蚁、自动索引,模拟程序或者蠕虫。
声明
爬虫只能用于爬取公开网站,别人加密的,涉及隐私的,不能随便爬取,否则后果自负。
需安装的模块 requests
常用方法
get
post
psot方法和get相比,不会吧参数显示到url地址上,所以post比get更加安全。
示例
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/1/15 16:37
# @Author : lingxiangxiang
# @File : demon3.py
import requests
proxies = {
"http": "http://163.125.197.244:9797",
# "https": "http://112.117.184.219:9999",
}
r1 = requests.get("http://2017.ip138.com/ic.asp", proxies=proxies)
r2 = requests.get("http://2017.ip138.com/ic.asp")
print(r1.text)
print(r2.text)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/1/15 16:37
# @Author : lingxiangxiang
# @File : demon3.py
import requests
proxies = {
"http": "http://182.121.203.45:9999",
# "https": "http://112.117.184.219:9999",
}
r1 = requests.get("https://www.lagou.com/jobs/positionAjax.json", proxies=proxies)
r2 = requests.get("https://www.lagou.com/jobs/positionAjax.json")
print(r1.text)
print(r2.text)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/1/15 16:37
# @Author : lingxiangxiang
# @File : demon3.py
import requests
proxies = {
"http": "http://182.108.5.246:8118",
# "https": "http://112.117.184.219:9999",
}
r1 = requests.get("http://2017.ip138.com/ic.asp", proxies=proxies)
r2 = requests.get("http://2017.ip138.com/ic.asp")
print(r1.text)
print(r2.text)