![](https://img-blog.csdnimg.cn/20201014180756923.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
爬虫
程序员勾践
这个作者很懒,什么都没留下…
展开
-
修改redis配置文件重新启动redis服务启动不了
一. 迁移一个新的配置文件cat redis.conf | grep -v "#" | grep -v "^$" ->redis-test.conf二.修改redis-test.conf配置三.启动redis-server /etc/redis-test.conf四.测试远程连接redis-cli -h xxx.xxx.xxx.xxxping已解决...原创 2022-02-10 12:10:44 · 1210 阅读 · 0 评论 -
requests读取超时和连接超时
超时重试import requestsdef gethtml(url): i = 0 while i < 3: try: html = requests.get(url, timeout=5).text return html except requests.exceptions.RequestException: i += 1if __name__==..原创 2022-01-20 15:05:56 · 1805 阅读 · 0 评论 -
navicat从A表中导出数据到excel中将excel中的数据导入B表中失败
1.把excel复制一份2.把B表中的自增且唯一id删除3.将复制的excel表中的id删除然后另存为转化成CSV文件4.全选复制CSV里面的所有数据5.直接粘贴到navicat6重新添加id为自增主键且唯一已解决...原创 2022-01-19 18:08:25 · 216 阅读 · 0 评论 -
(Caused by SSLError(SSLError(1, ‘[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
(Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:877)'),))python3.6解决:pip install urllib3==1.25.11忽略ssl认证请求语句 中加上参数verify=False已解决原创 2022-01-19 14:33:32 · 2419 阅读 · 0 评论 -
requests.exceptions.ProxyError: HTTPSConnectionPool(host=‘xxx‘, port=443): Max retries exc
requests.exceptions.ProxyError: HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: /branddb/jsp/select.jsp (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000258原创 2022-01-14 14:16:15 · 12809 阅读 · 2 评论 -
curl: (52) Empty reply from server
待解决原创 2022-01-13 09:58:10 · 1914 阅读 · 0 评论 -
request.exceptions.ProxyError:HTTPSConnectionPool(host=‘wwww.baidu.com‘,port=443):Max retries exceed
代理链接不上使用了代理IP之后还是拒绝访问是什么原因?很多时候,大家再访问网站的时候会受到限制,这是因为网站设置了同一个IP地址的访问次数,所以网站才会出现拒绝访问的情况,这个时候大家想到的都是只要用代理IP进行访问,访问就不会再受到限制。但是有时即使是使用了代理IP,但是依旧会出现拒绝访问的字样,为什么会出现这样的情况,原因有以下几点第一,Interner设置问题如果在换了几次代理IP之后,还是会出现拒绝访问的情况,那么说明代理IP地址是没有问题的,有问题的是本地电脑,一般都是Int.原创 2022-01-13 09:52:21 · 2174 阅读 · 0 评论 -
购买代理添加白名单进行api提取里面出现错误
购买代理添加白名单进行api提取里面出现错误,添加局域网ip出现{"code":1004,"msg":"白名单不存在:xxx.xxx.xxx.xx","successs":false,"data":null}在添加白名单中删除局域网ip添加外网ip,已解决什么是外网IP和内网IP? - 知乎...原创 2022-01-12 15:41:58 · 312 阅读 · 0 评论 -
AttributeError: ‘NoneType‘ object has no attribute ‘get‘
{"error": "Could not process your request at the moment."}未解决原创 2022-01-12 14:08:05 · 445 阅读 · 0 评论 -
常用远程控制软件推荐
1.todeskToDesk远程控制软件下载-电脑客户端,移动app官网下载-ToDesk官网2.向日葵向日葵远程控制软件_远程控制电脑手机_远程桌面连接_远程办公|游戏|运维-贝锐向日葵官网3.teamviewerTeamViewer官网, 远程控制软件, 远程连接软件, 远程控制电脑, 远程桌面工具...原创 2022-01-10 11:53:51 · 2656 阅读 · 0 评论 -
TypeError: ‘ent_info‘ object does not support item assignment
案例sql = f'UPDATE into xxx SET url ="{url}",contactperson="{contactperson}",position="{position}",telephonenumber="{telephonenumber}",mobilenumber="{number_no}",create_time="{date}",update_time="{date}"'改进原创 2021-12-09 09:18:38 · 173 阅读 · 0 评论 -
AttributeError: ‘NoneType‘ object has no attribute ‘nrows‘
由于自己安装xlrd是2.0.1版本,只支持.xls文件所以xlrd.open_workbook('xxx.xlsx')会报错可以安装旧版xlrd,在cmd中运行:pip uninstall xlrdpip install xlrd==1.2.0已解决原创 2021-12-06 16:07:41 · 1208 阅读 · 0 评论 -
AttributeError: ‘function‘ object has no attribute ‘HTML‘
卸载重新安装1.pip uninstall lxml2.pip install lxml已安装原创 2021-11-03 16:44:05 · 3704 阅读 · 2 评论 -
ImportError: cannot import name ‘MiddlewareManager‘ from ‘scrapy.middleware‘
待解决原创 2021-10-11 15:16:26 · 236 阅读 · 0 评论 -
谷歌network没显示接口信息问题,多种情况下解决
1.右击检查发现network没有显示接口信息2.点击Filter如参考下图3.点击ALL或者XHR4.查看是否是urls信息栏是否是ALL全部的信息,或是勾选了XHR异步的,勾选了XHR就把同步的url信息都过滤了,如果你当前页面没有异步请求,那么是看不到url信息的。5.但是我这里勾选的是ALL,依然没看到所有url信息,说明不是这里过滤的问题;情况2.查看你是否是勾选了阻止cookie,我这里也不知道怎么勾上了,之前没遇到这种问题,所有压根没注意是它的问题,如果你也勾选了,.原创 2021-10-08 10:56:41 · 10232 阅读 · 1 评论 -
爬虫当scrapy改用python xx.py启动出现报错时
抓数url为 [('https://www.aliexpress.com/store/3377003/search\n', 'ali', 'PY20210400011', 'AliExpress')]开始aliexpress店铺抓数抓数url为 [('https://www.aliexpress.com/store/3377003/search\n', 'ali', 'PY20210400011', 'AliExpress')]https://www.aliexpress.com/store/3377原创 2021-08-16 21:31:36 · 573 阅读 · 0 评论 -
bad magic number in ‘scrapy_service.spiders.product_spider_object_type_html_1‘: b‘\x03\xf3\r\n‘
待解决原创 2021-07-21 21:35:39 · 69 阅读 · 0 评论 -
python ERROR: bitarray-0.8.3-cp37-cp37m-win_amd64.whl is not a supported wheel on this platform.
1.python -m pip install --upgrade pip2.pip install bitarray-2.2.2-cp37-cp37m-win_amd64.whl已解决原创 2021-07-21 20:39:26 · 620 阅读 · 0 评论 -
> 1265 - Data truncated for column ‘telephonenumber‘ at row 1
案例insert into lm_product_supply_price (telephonenumber) values ("86 0769 83002690")telephonenumber=telephonenumber.replace(' ','')1264 - Out of range value for column 'telephonenumber' at row 1将int 改成bigint类型长度大于14已解决...原创 2021-07-14 17:45:43 · 126 阅读 · 0 评论 -
Python正则表达式时出现TypeError: expected string or bytes-like object
content = re.findall('<div class="contact-info" data-spm-protocol="i">(.*?)<div class="map-container" data-name="map">',result,re.S)修改为content = re.findall('<div class="contact-info" data-spm-protocol="i">(.*?)<div class="map-c...原创 2021-07-14 17:12:47 · 416 阅读 · 0 评论 -
ERROR: Could not find a version that satisfies the requirement PIL (from versions: none
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'D:\CookiesPool\Scripts\python.exe'.1.d:2.cdCookiesPool\Scripts3.python -m pip install Pillow.原创 2021-07-07 09:50:48 · 626 阅读 · 0 评论 -
解决 python execjs._exceptions.ProgramError
return self._eval("{identifier}.apply(this, {args})".format(identifier=identifier, args=args))File "/usr/local/lib/python3.7/site-packages/execjs/_external_runtime.py", line 78, in _evalreturn self.exec_(code)File "/usr/local/lib/python3.7/site-packages原创 2021-06-16 14:13:23 · 6127 阅读 · 0 评论 -
python爬虫私密代理的添加(多线程)
# -*- coding: UTF-8 -*-'''Python 3.x无忧代理IP Created on 2018年05月11日描述:本DEMO演示了使用爬虫(动态)代理IP请求网页的过程,代码使用了多线程逻辑:每隔5秒从API接口获取IP,对于每一个IP开启一个线程去抓取网页源码@author: www.data5u.com'''import requestsimport timeimport threadingimport urllib3ips = []# 爬数据的线.原创 2021-06-15 20:14:07 · 1063 阅读 · 4 评论 -
解决redis.exceptions.AuthenticationError: Client sent AUTH, but no password is set
Traceback (most recent call last): File "C:\Users\mypc\Desktop\proxy_pool\proxy_pool-master\db\redisClient.py", line 144, in test self.getCount() File "C:\Users\mypc\Desktop\proxy_pool\proxy_pool-master\db\redisClient.py", line 130, in getCount ...原创 2021-06-09 16:24:57 · 1881 阅读 · 4 评论 -
爬虫使用代理socks
# -*- coding: UTF-8 -*-'''Python 3.x无忧代理IP Created on 2018年05月11日描述:本DEMO演示了使用爬虫(动态)代理IP请求网页的过程,代码使用了多线程逻辑:每隔5秒从API接口获取IP,对于每一个IP开启一个线程去抓取网页源码注意:需先安装socks模块 pip3 install 'requests[socks]'@author: www.data5u.com'''import requests;import time;im.转载 2021-06-09 11:04:30 · 318 阅读 · 0 评论 -
python爬虫代理的添加多线程
# -*- coding: UTF-8 -*-'''Python 3.x无忧代理IP Created on 2018年05月11日描述:本DEMO演示了使用爬虫(动态)代理IP请求网页的过程,代码使用了多线程逻辑:每隔5秒从API接口获取IP,对于每一个IP开启一个线程去抓取网页源码@author: www.data5u.com'''import requests;import time;import threading;import urllib3;ips = [];# .转载 2021-06-09 11:01:56 · 227 阅读 · 0 评论 -
解决requests.exceptions.ProxyError
Traceback (most recent call last): File "gevent_cainiao_track.py", line 408, in <module> cainiao.main() File "gevent_cainiao_track.py", line 398, in main job = gevent.spawn(self.parse_page(url)) File "gevent_cainiao_track.py", line 300,...原创 2021-06-08 09:07:40 · 1139 阅读 · 1 评论 -
python requests.exceptions.ProxyError:
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='t.17track.net', port=443): Max retries exceeded with url: /restapi/track (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x00000149AB632588>, 'Connection to 1原创 2021-06-03 18:05:52 · 174 阅读 · 0 评论 -
TypeError: ‘NoneType‘ object is not subscriptable
Traceback (most recent call last): File "D:/track/17track_back/17track_spider.py", line 122, in <module> spider.run('VR417003364YP') File "D:/track/17track_back/17track_spider.py", line 70, in run self.analysis_data(track_id) File "D:/t...原创 2021-06-03 10:56:10 · 318 阅读 · 0 评论 -
python requests.exceptions.ConnectTimeout
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='t.17track.net', port=443): Max retries exceeded with url: /restapi/track (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001C3035D1D08>, 'Connection to 2原创 2021-06-03 10:52:12 · 2186 阅读 · 0 评论 -
python爬虫爬取数据遇到requests.exceptions.ProxyError:......错误
requests.exceptions.ProxyError: HTTPSConnectionPool(host='t.17track.net', port=443): Max retries exceeded with url: /restapi/track (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x...原创 2021-06-02 18:00:44 · 917 阅读 · 0 评论