python创建快捷方式1004无标题_Python HTTPConnectionPool无法建立新的连接：[Errno 11004] getaddrinfo失败...

最新推荐文章于 2023-01-31 11:22:33 发布

Jake谢佳

最新推荐文章于 2023-01-31 11:22:33 发布

阅读量175

点赞数

文章标签： python创建快捷方式1004无标题

本文链接：https://blog.csdn.net/weixin_27282525/article/details/111896444

版权

I was wondering if my requests is stopped by the website and I need to set a proxy.I first try to close the http's connection ,bu I failed.I also try to test my code but now it seems no outputs.Mybe I use a proxy everything will be OK?

Here is the code.

import requests

from urllib.parse import urlencode

import json

from bs4 import BeautifulSoup

import re

from html.parser import HTMLParser

from multiprocessing import Pool

from requests.exceptions import RequestException

import time

def get_page_index(offset, keyword):

#headers = {'User-Agent':'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50'}

data = {

'offset': offset,

'format': 'json',

'keyword': keyword,

'autoload': 'true',

'count': 20,

'cur_tab': 1

}

url = 'http://www.toutiao.com/search_content/?' + urlencode(data)

try:

response = requests.get(url, headers={'Connection': 'close'})

response.encoding = 'utf-8'

if response.status_code == 200:

return response.text

return None

except RequestException as e:

print(e)

def parse_page_index(html):

data = json.loads(html)

if data and 'data' in data.keys():

for item in data.get('data'):

url = item.get('article_url')

if url and len(url) < 100:

yield url

def get_page_detail(url):

try:

response = requests.get(url, headers={'Connection': 'close'})

response.encoding = 'utf-8'

if response.status_code == 200:

return response.text

return None

except RequestException as e:

print(e)

def parse_page_detail(html):

soup = BeautifulSoup(html, 'lxml')

title = soup.select('title')[0].get_text()

pattern = re.compile(r'articleInfo: (.*?)},', re.S)

pattern_abstract = re.compile(r'abstract: (.*?)\.', re.S)

res = re.search(pattern, html)

res_abstract = re.search(pattern_abstract, html)

if res and res_abstract:

data = res.group(1).replace(r".replace(/
|\n|\r/ig, '')", "") + '}'

abstract = res_abstract.group(1).replace(r"'", "")

content = re.search(r'content: (.*?),', data).group(1)

source = re.search(r'source: (.*?),', data).group(1)

time_pattern = re.compile(r'time: (.*?)}', re.S)

date = re.search(time_pattern, data).group(1)

date_today = time.strftime('%Y-%m-%d')

img = re.findall(r'src="(.*?)&quot', content)

if date[1:11] == date_today and len(content) > 50 and img:

return {

'title': title,

'content': content,

'source': source,

'date': date,

'abstract': abstract,

'img': img[0]

}

def main(offset):

flag = 1

html = get_page_index(offset, '光伏')

for url in parse_page_index(html):

html = get_page_detail(url)

if html:

data = parse_page_detail(html)

if data:

html_parser = HTMLParser()

cwl = html_parser.unescape(data.get('content'))

data['content'] = cwl

print(data)

print(data.get('img'))

flag += 1

if flag == 5:

break

if __name__ == '__main__':

pool = Pool()

pool.map(main, [i*20 for i in range(10)])

and the error is the here!

HTTPConnectionPool(host='tech.jinghua.cn', port=80): Max retries exceeded with url: /zixun/20160720/f191549.shtml (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

By the way, When I test my code at first it shows everything is OK!

Thanks in advance!

解决方案

It seems to me you're hitting the limit of connection in the HTTPConnectionPool. Since you start 10 threads at the same time

Try one of the following:

Increase the request timeout (seconds): requests.get('url', timeout=5)

Close the response: Response.close(). Instead of returning response.text, assign response to a varialble, close Response, and then return variable

Jake谢佳

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python创建快捷方式1004无标题_Python HTTPConnectionPool无法建立新的连接：[Errno 11004] getaddrinfo失败...

I was wondering if my requests is stopped by the website and I need to set a proxy.I first try to close the http's connection ,bu I failed.I also try to test my code but now it seems no outputs.Mybe I...
复制链接

扫一扫

python创建快捷方式1004无标题_Python HTTPConnectionPool无法建立新的连接：[Errno 11004] getaddrinfo失败...

“相关推荐”对你有帮助么？