随意的学习笔记——爬虫

最新推荐文章于 2024-08-15 01:55:01 发布

我爱编程，，不

最新推荐文章于 2024-08-15 01:55:01 发布

阅读量409

点赞数

文章标签： python

本文链接：https://blog.csdn.net/qq_46771729/article/details/106555445

版权

from urllib.request import urlopen
file=urlopen('https://www.cnki.net')
print(file.read())

查看现有的虚拟环境

conda info -e

创建虚拟环境

conda crete -n py3.7 python=3.7

使用虚拟环境

conda activate py3.7

退出当前虚拟环境

conda deactivate

创建虚拟环境
python -m venv 虚拟环境名
进入虚拟环境
cd 虚拟环境名
查看文件路径
dir
进入配置文件
cd lib
退出
cd …
激活虚拟环境
activate
去激活虚拟环境
deactivate
然后正常用
python
import sys
from pprint import pprint
pprint(sys.path)
退出python解释器 ctrl +z

爬虫时出现报错

1.HTTPError 错误

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://blog.csdn.net/HUOXUANJIAN/article/details/991314131")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

else:
    print("It worked!!!!!")    
    print(bso.li)

2.URLError（网址错误）

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://buzhidao2333.com")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1)

3.tag错误

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://www.so.com/s?ie=UTF-8&q=Bling")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1)

Warning (from warnings module):
File “C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py”, line 8
bso=BeautifulSoup(html.read())
GuessedAtParserWarning: No parser was explicitly specified, so I’m using the best available HTML parser for this system (“html.parser”). This usually isn’t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 8 of the file C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py. To get rid of this warning, pass the additional argument ‘features=“html.parser”’ to the BeautifulSoup constructor.

It worked!!!
None

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://www.so.com/s?ie=UTF-8&q=Bling")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1.tag88)

It worked!!!
Traceback (most recent call last):
File “C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py”, line 19, in
print(bso.h1.tag233)
AttributeError: ‘NoneType’ object has no attribute 'tag233’
4.简单的爬虫程序

import requests
from bs4 import BeautifulSoup

requests.packages.urllib3.disable_warnings()

URL="https://detail.tmall.com/item.htm?id=597335415626&ali_refid=a3_430673_1006:1203720110:N:P8/nHWzLrqbR3w0OfZcNLQ==:c218e490851dd9f17b90accb26decb40&ali_trackid=1_c218e490851dd9f17b90accb26decb40&spm=a2e15.8261149.07626516002.1&sku_properties=5919063:6536025"

#headers     User-Agent=用户代理
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}

page=requests.get(URL,headers=headers)

soup_obj=BeautifulSoup(page.content,'html.parser')

new_project=soup_obj.find("h1",{"data-spm":"1000983"}).get_text()
new_price=soup_obj.find(id="J_StrPriceModBox").get_text()

#print(soup_obj.prettify)
print(new_project)
print(new_price)

爬取价格

我爱编程，，不

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
随意的学习笔记——爬虫

from urllib.request import urlopenfile=urlopen('https://www.cnki.net')print(file.read())查看现有的虚拟环境conda info -e创建虚拟环境conda crete -n py3.7 python=3.7使用虚拟环境conda activate py3.7退出当前虚拟环境conda deactivate创建虚拟环境python -m venv 虚拟环境名进入虚拟环境cd 虚拟环境名
复制链接

扫一扫