conda SSL错误 SSLError，爬虫应用总结

最新推荐文章于 2024-06-26 20:19:03 发布

业余游曳手

最新推荐文章于 2024-06-26 20:19:03 发布

阅读量2.6k

点赞数

本文链接：https://blog.csdn.net/Sillver_/article/details/113094737

版权

1.conda SSL错误 SSLError("Can’t connect to HTTPS URL because the SSL module is not available.解决办法
使用conda环境来运行爬虫程序，因为无法使用电脑自带的SSL而出现无法连接上https的协议，因为https是基于SSL进行加密的。
OpenSSL下载地址
https://slproweb.com/products/Win32OpenSSL.html
即可正确运行。

2.文件操作：几个属性，'w’是可写，'a’是追加，'wb’是二进制可写。

3.格式化字符串,f" {i} “，这个i是自己给的值。
还可以” %d " %i
还可以" {}".format(i)

4.一个文件中的json的多行数据提取，使用readlines()，在使用json.jumps()来每行转换成一个列表操作！

import requests
import json
import os
headers_ = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
}

if os.path.exists("notebook1.json"):
    os.remove("notebook1.json")

for i in range(2):
    url_ = f"https://movie.douban.com/j/chart/top_list?type=24&interval_id=100%3A90&action=&start={i*20}&limit=20"
    response_ = requests.get(url=url_, headers=headers_)
    context = response_.text
    with open("notebook1.json", "a") as f:
        f.write(context + '\n')
        # f.write('%s' % context + '\n')

data = []
f = open("notebook1.json", 'r')
i = 1
for line in f.readlines():
    data = []
    list = json.loads(line)
    print(i)
    for film in list:
        data.append(film["title"])
    print(data)
    i += 1
f.close()

5.conda304reponse报错
清除下载缓存：conda clean -i
forge源出现问题：conda config --remove channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge

6.urllib2

import urllib2

def download(url):
    return urllib2.urlopen(url).read()

def download2(url):
    return urllib2.urlopen(url).readlines()

def download3(url):
    response = urllib2.urlopen(url)
    while True:
        line = response.readline()
        if len(line):
            print line
        else:
            break

print download3("https://github.com")

7.selenium搭配正则表达式(regex)抓取信息的两种方式

# coding:utf-8
import selenium
import selenium.webdriver
import re

url = "https://search.51job.com/list/220600,000000,0000,00,9,99,%2B,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare="
driver = selenium.webdriver.Firefox()
driver.get(url)
source = driver.page_source
res = """<div class="rt">
                    共798条职位
                </div>"""
# pattern = "<div class=\"rt\">([\s\S]*?)</div>"
pattern = u"共(\\d+)条职位"
regex = re.compile(pattern, re.IGNORECASE)
mylist = regex.findall(source)
print mylist[0]
# print re.findall("(\\d+)", mylist[0])[0]
driver.close()

业余游曳手

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
conda SSL错误 SSLError，爬虫应用总结

1.conda SSL错误 SSLError("Can’t connect to HTTPS URL because the SSL module is not available.解决办法使用conda环境来运行爬虫程序，因为无法使用电脑自带的SSL而出现无法连接上https的协议，因为https是基于SSL进行加密的。OpenSSL下载地址https://slproweb.com/products/Win32OpenSSL.html即可正确运行。2.文件操作：几个属性，'w’是可写，'a’是追
复制链接

扫一扫