记录vultr搭建https爬虫代理
记录vultr搭建https爬虫代理
要用python请求谷歌翻译把中文翻译成英文然后发不到网站做seo用,需要用到代理,网上找的教程,没有成功过,主要是写的不清不楚,记录下折腾的过程,排版不会用,将就看着吧
这个教程我自己测试了几台centos7.X的服务器全部测试ok
准备
- 一台vultr 5刀一个月的服务器,关闭selinux;
- centos7或者8,先关闭防火墙,等所有访问都没问题了再配置端口
- 安装squid: yum install squid;
- 安装httpd-tools : yum -y install httpd-tools,生成账号密码用;
- 先配置好http的代理没问题了再配置https的;
安装squid和httpd-tools,直接yum安装
1.[root@vultr ~]#yum install squid
2.[root@vultr ~]#yum -y install httpd-tools
设置认证的账号密码
1.创建密码文件并给权限
注:/etc/squid/passwd 这个文件名称你可以改成自己的,改过之后的话配置文件里的名称也要改成一样的
命令:[root@vultr ~]#touch /etc/squid/passwd && chown squid /etc/squid/passwd
2.创建用户密码
注: yourusername 是你自己设置的用户名
命令: [root@vultr ~]#htpasswd /etc/squid/passwd yourusername
按回车后会提示你输入两次密码,密码最好不要超过8位吧
3.验证账号密码文件和格式是否正确
命令: [root@vultr ~]#/usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
输入命令后,再直接输入 用户名 空格 密码
比如: yourusername yourpassword
输入后如果提示ok说明没问题,按ctrl+c中断,继续下一步,如果提示err,就是密码格式或者密码文件有问题,继续上一步重新设置用户名密码;如果还有问题,可能是httpd-tools的问题,下载重装下http2.2的版本不要用2.4的,我好像没遇到过这种问题
配置squid,支持http代理
命令: [root@vultr ~]#vi /etc/squid/squid.conf
1.在配置文件的acl代码块下添加
找到最后一行acl 开头的,按o在下一行插入下面的代码
auth_param basic program /usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users
2.在INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS下一行设置dns服务器地址
dns服务器地址: dns_nameservers 8.8.8.8
设置完效果:
#INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
dns_nameservers 8.8.8.8
3,注释掉 http_access denny all 或者改成 http_access allow auth_users 和http_access allow auth_users 中的一个
改完后效果:
#And finally deny all other access to this proxy
http_access allow auth_users
4,修改端口号,把原来的3128改成你自己想要的端口号
#Squid normally listens to port 3128
http_port 21828
5,设置高匿
#文件最后加上 高匿配置
request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all
6,保存squid配置文件,重启squid服务
[root@vultr ~]#systemctl restart squid
http代理配置完成,谷歌安装SwitchyOmega测试成功
设置好SwitchyOmega,访问ip138,出现你的服务器ip说明成功,但是这时还不支持https的网站访问,因为我是要访问https的谷歌翻译,所以继续折腾
需要说明,连接的时候要用解析的二级域名去连接,用ip我连接不上,也不知道是什么问题
配置支持https
1,准备一个二级域名申请个腾讯的免费ssl证书,我查网上的教程用openssl生成服务器自签证书一直不成功
2,域名解析好,证书上传服务器
3,修改配置文件
vi /etc/squid/squid.conf
# Squid normally listens to port 3128
http_port 21828 #默认的http的代理端口,改成你自己的或者注释掉
#6618这个端口,默认是443的,我改成自己的
https_port 6618 cert=/etc/squid/cert/你的证书名字.crt key=/etc/squid/cert/你的证书名字.key
# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 100 16 256 #这前面的注释去掉
#找到下面代码,应该在文件最后
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
via off #新增
forwarded_for delete #新增
dns_v4_first on #新增
#文件最后加上 高匿配置 #新增
request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all
4,保存重启squid服务
测试连接https网站,成功
配置防火墙
centos7.X的防火墙命令
[root@vultr ~]# firewall-cmd --zone=public --add-port=6618/tcp --permanent
success
[root@vultr ~]# firewall-cmd --reload
success
[root@vultr ~]# firewall-cmd --zone=public --add-port=21828/tcp --permanent
success
[root@vultr ~]# firewall-cmd --reload
success
centos6.X防火墙
[root@vultr ~]# vi /etc/sysconfig/iptables
#在文件中加入
-A INPUT -m state --state NEW -m tcp -p tcp --dport 6618 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 21828 -j ACCEPT
[root@vultr ~]# systemctl restart iptables
到这里就完成成功了
附上自己的squid配置文件,做个备份
#
# Recommended minimum configuration:
#
# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed
acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
acl localnet src 182.110.0.0/12 # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
auth_param basic program /usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users
#
# Recommended minimum Access Permission configuration:
#
# Deny requests to certain unsafe ports
http_access deny !Safe_ports
# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
dns_nameservers 8.8.8.8
# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost
# And finally deny all other access to this proxy
http_access allow auth_users
# Squid normally listens to port 3128
http_port 21828
https_port 6618 cert=/etc/squid/cert/pachong.com.crt key=/etc/squid/cert/pachong.com.key
# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 100 16 256 #这个要打开
# Leave coredumps in the first cache dir
coredump_dir /var/spool/squid
#
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
via off
forwarded_for delete
dns_v4_first on
#文件最后加上 高匿配置
request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all