记录vultr搭建https爬虫代理

记录vultr搭建https爬虫代理

要用python请求谷歌翻译把中文翻译成英文然后发不到网站做seo用,需要用到代理,网上找的教程,没有成功过,主要是写的不清不楚,记录下折腾的过程,排版不会用,将就看着吧
这个教程我自己测试了几台centos7.X的服务器全部测试ok

准备

  1. 一台vultr 5刀一个月的服务器,关闭selinux;
  2. centos7或者8,先关闭防火墙,等所有访问都没问题了再配置端口
  3. 安装squid: yum install squid;
  4. 安装httpd-tools : yum -y install httpd-tools,生成账号密码用;
  5. 先配置好http的代理没问题了再配置https的;

安装squid和httpd-tools,直接yum安装

1.[root@vultr ~]#yum install squid
2.[root@vultr ~]#yum -y install httpd-tools

设置认证的账号密码

1.创建密码文件并给权限
注:/etc/squid/passwd 这个文件名称你可以改成自己的,改过之后的话配置文件里的名称也要改成一样的

命令:[root@vultr ~]#touch /etc/squid/passwd && chown squid /etc/squid/passwd

2.创建用户密码
注: yourusername 是你自己设置的用户名

命令: [root@vultr ~]#htpasswd /etc/squid/passwd yourusername

按回车后会提示你输入两次密码,密码最好不要超过8位吧

3.验证账号密码文件和格式是否正确

命令: [root@vultr ~]#/usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
输入命令后,再直接输入 用户名 空格 密码
比如: yourusername yourpassword
输入后如果提示ok说明没问题,按ctrl+c中断,继续下一步,如果提示err,就是密码格式或者密码文件有问题,继续上一步重新设置用户名密码;如果还有问题,可能是httpd-tools的问题,下载重装下http2.2的版本不要用2.4的,我好像没遇到过这种问题

配置squid,支持http代理

命令: [root@vultr ~]#vi /etc/squid/squid.conf

1.在配置文件的acl代码块下添加
找到最后一行acl 开头的,按o在下一行插入下面的代码

auth_param basic program /usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users

2.在INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS下一行设置dns服务器地址
dns服务器地址: dns_nameservers 8.8.8.8
设置完效果:

#INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
dns_nameservers 8.8.8.8

3,注释掉 http_access denny all 或者改成 http_access allow auth_users 和http_access allow auth_users 中的一个
改完后效果:

#And finally deny all other access to this proxy
http_access allow auth_users 

4,修改端口号,把原来的3128改成你自己想要的端口号

#Squid normally listens to port 3128
http_port 21828

5,设置高匿

#文件最后加上 高匿配置

request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all

6,保存squid配置文件,重启squid服务
[root@vultr ~]#systemctl restart squid

http代理配置完成,谷歌安装SwitchyOmega测试成功

设置好SwitchyOmega,访问ip138,出现你的服务器ip说明成功,但是这时还不支持https的网站访问,因为我是要访问https的谷歌翻译,所以继续折腾

需要说明,连接的时候要用解析的二级域名去连接,用ip我连接不上,也不知道是什么问题

配置支持https

1,准备一个二级域名申请个腾讯的免费ssl证书,我查网上的教程用openssl生成服务器自签证书一直不成功
2,域名解析好,证书上传服务器
3,修改配置文件

vi /etc/squid/squid.conf
# Squid normally listens to port 3128
http_port 21828 #默认的http的代理端口,改成你自己的或者注释掉
#6618这个端口,默认是443的,我改成自己的
https_port 6618 cert=/etc/squid/cert/你的证书名字.crt key=/etc/squid/cert/你的证书名字.key

# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 100 16 256 #这前面的注释去掉

#找到下面代码,应该在文件最后
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320
via off #新增
forwarded_for delete #新增
dns_v4_first on #新增
#文件最后加上  高匿配置  #新增
request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all

4,保存重启squid服务
测试连接https网站,成功

配置防火墙

centos7.X的防火墙命令

[root@vultr ~]# firewall-cmd --zone=public --add-port=6618/tcp --permanent
success
[root@vultr ~]# firewall-cmd --reload
success
[root@vultr ~]# firewall-cmd --zone=public --add-port=21828/tcp --permanent
success
[root@vultr ~]# firewall-cmd --reload
success

centos6.X防火墙

[root@vultr ~]# vi /etc/sysconfig/iptables
#在文件中加入
-A INPUT -m state --state NEW -m tcp -p tcp --dport 6618 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 21828 -j ACCEPT
[root@vultr ~]# systemctl restart iptables

到这里就完成成功了

附上自己的squid配置文件,做个备份

#
# Recommended minimum configuration:
#
# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed
acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
acl localnet src 182.110.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT

auth_param basic program /usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users

#
# Recommended minimum Access Permission configuration:
#
# Deny requests to certain unsafe ports
http_access deny !Safe_ports

# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports

# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
dns_nameservers 8.8.8.8

# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost

# And finally deny all other access to this proxy
http_access allow auth_users 

# Squid normally listens to port 3128
http_port 21828 

https_port 6618 cert=/etc/squid/cert/pachong.com.crt key=/etc/squid/cert/pachong.com.key
# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 100 16 256 #这个要打开

# Leave coredumps in the first cache dir
coredump_dir /var/spool/squid

#
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320
via off
forwarded_for delete
dns_v4_first on
#文件最后加上  高匿配置
request_header_access X-Forwarded-For deny all
request_header_access From deny all
request_header_access Via deny all
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python爬虫代理IP代理是一种在使用Python爬虫时,通过使用代理IP来隐藏真实的IP地址,以防止被目标网站封禁或限制访问。这样可以有效地绕过网站的反爬取技术,并且可以实现高强度、高效率地爬取网页信息而不给网站服务器带来过大的压力。 为了实现代理IP代理,可以按照以下步骤进行操作: 1. 首先,需要获取稳定的代理IP列表。可以参考相关博客或网站,如《python爬虫设置代理ip池——方法(二)》,从中购买或获取代理IP。 2. 接下来,在Python代码中调用代理IP。可以使用以下代码获取代理IP列表并随机选择一个IP作为代理: ```python if __name__ == '__main__': url = 'http://www.xicidaili.com/nn/' ip_list = get_ip_list(url) proxies = get_random_ip(ip_list) print(proxies) ``` 其中,`get_ip_list()`函数用于获取代理IP列表,`get_random_ip()`函数用于从列表中随机选择一个IP作为代理。 通过以上步骤,就可以使用代理IP实现Python爬虫代理IP代理功能了。这样可以有效地绕过网站的反爬取技术,并且提高爬取效率。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* *3* [python爬虫设置代理ip池——方法(一)](https://blog.csdn.net/weixin_40372371/article/details/80154707)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 100%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值