urllib2.URLError: urlopen error [Errno 111] Connection refused

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/u014452812/article/details/84615320

记录个还没解决的问题。下面爬虫代码是可以执行的,但是在我的Ubuntu的虚拟中刚开始是可以运行的,但是,后来不知道改了什么东西,用urllib2写的爬虫和用scrapy 的爬虫代码都不能运行了!!。

import urllib2
import re

class Spider:
    def __init__(self):
        self.page = 1
        self.switch = True
    
    def loadPage(self):
        print 'loadPage'
        url = "http://www.neihan8.com/article/list_5_" + str(self.page) + ".html"
        headers = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
        
        request = urllib2.Request(url, headers=headers)
        response = urllib2.urlopen(request)
        
        html = response.read()
        gbk_html = html.decode('gbk').encode('utf-8')
        pattern = re.compile('<div\sclass="f18 mb20">(.*?)</div>', re.S)
        content_list = pattern.findall(gbk_html)
        self.dealPage(content_list)

    def dealPage(self, content_list):
        for item in content_list:
            item = item.replace('<br />', '').replace('<p>', '').replace('</p>', '')
            self.writePage(item)
    
    def writePage(self, item):
        with open('duanzi.txt', 'a') as f:
            f.write(item)

    def startWork(self):
        while self.switch:
            self.loadPage()
            command = raw_input('please enter continue, q back')
            if command == 'q':
                self.switch = False
            self.page += 1
        print '3q use'
	
if __name__ == '__main__':

    s = Spider()
    s.startWork()

 爬虫结果

在终端下的错误信息。

Traceback (most recent call last):
  File "01-neihan.py", line 44, in <module>
    s.startWork()
  File "01-neihan.py", line 34, in startWork
    self.loadPage()
  File "01-neihan.py", line 15, in loadPage
    response = urllib2.urlopen(request)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1228, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

出现这个问题之前是我想fangqiang, 所以设置了代理,后来把代理关了也不行。这问题目前还没解决,不知道是Ubuntu的环境问题,还是python的问题。

问题:定位出问题,确实是代理的问题了。

解决办法:

1、首先查看下 /etc/apt/apt.conf,发现里面里面有:

http_proxy="http://192.168.16.109:13128/"
https_proxy="https://192.168.16.109:13128/"

也许内容和我的不一样。然后删除这个文件,然后重启电脑,发现里面还没有解决。

2、查看一下:cat /etc/enviroment,发现有配置

http_proxy="http://192.168.16.109:13128/"
https_proxy="https://192.168.16.109:13128/"

把里面的配置文件删除了,(切记PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"这一行不要删除,不然Ubuntu开机就不能进入到桌面了)

3、然后重启电脑,执行代码,问题解决。

 

 

 

展开阅读全文

没有更多推荐了,返回首页