随意的学习笔记——爬虫

from urllib.request import urlopen
file=urlopen('https://www.cnki.net')
print(file.read())

查看现有的虚拟环境

conda info -e

创建虚拟环境

conda crete -n py3.7 python=3.7

使用虚拟环境

conda activate py3.7

退出当前虚拟环境

conda deactivate

创建虚拟环境
python -m venv 虚拟环境名
进入虚拟环境
cd 虚拟环境名
查看文件路径
dir
进入配置文件
cd lib
退出
cd …
激活虚拟环境
activate
去激活虚拟环境
deactivate
然后正常用
python
import sys
from pprint import pprint
pprint(sys.path)
退出python解释器 ctrl +z

爬虫时出现报错

1.HTTPError 错误

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://blog.csdn.net/HUOXUANJIAN/article/details/991314131")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

else:
    print("It worked!!!!!")    
    print(bso.li)


2.URLError(网址错误)

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://buzhidao2333.com")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1)


3.tag错误

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://www.so.com/s?ie=UTF-8&q=Bling")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1)


Warning (from warnings module):
File “C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py”, line 8
bso=BeautifulSoup(html.read())
GuessedAtParserWarning: No parser was explicitly specified, so I’m using the best available HTML parser for this system (“html.parser”). This usually isn’t a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 8 of the file C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py. To get rid of this warning, pass the additional argument ‘features=“html.parser”’ to the BeautifulSoup constructor.

It worked!!!
None

from urllib.request import urlopen
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup

try:
    html=urlopen("https://www.so.com/s?ie=UTF-8&q=Bling")
    bso=BeautifulSoup(html.read())

except HTTPError as e:
    print(e)

except URLError as e:
    print(e)


else:
    print("It worked!!!!!")    
    print(bso.h1.tag88)


It worked!!!
Traceback (most recent call last):
File “C:/Users/Miss Lu/web/Scripts/web_edu/urlerror.py”, line 19, in
print(bso.h1.tag233)
AttributeError: ‘NoneType’ object has no attribute 'tag233’

4.简单的爬虫程序

import requests
from bs4 import BeautifulSoup

requests.packages.urllib3.disable_warnings()

URL="https://detail.tmall.com/item.htm?id=597335415626&ali_refid=a3_430673_1006:1203720110:N:P8/nHWzLrqbR3w0OfZcNLQ==:c218e490851dd9f17b90accb26decb40&ali_trackid=1_c218e490851dd9f17b90accb26decb40&spm=a2e15.8261149.07626516002.1&sku_properties=5919063:6536025"

#headers     User-Agent=用户代理
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}

page=requests.get(URL,headers=headers)

soup_obj=BeautifulSoup(page.content,'html.parser')

new_project=soup_obj.find("h1",{"data-spm":"1000983"}).get_text()
new_price=soup_obj.find(id="J_StrPriceModBox").get_text()

#print(soup_obj.prettify)
print(new_project)
print(new_price)

爬取价格

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值