Python爬虫入门遇到的坑

1. 环境 

- Python
  mac os预装的python 

$ python -V  
Python 2.7.10
$ where python
/usr/bin/python
$ ls /System/Library/Frameworks/Python.framework/Versions
2.3     2.5     2.6     2.7     Current
$ ls /Library/Frameworks/Python.framework/Versions (用户安装的目录)

- IDE
  Pycharm
- 辅助
  安装pip

sudo easy_install pip

- Python库

sudo pip install requests (默认安装requests 2.13.0) 
sudo pip install BeautifulSoup (默认安装BeautifulSoup 3.2.1)
sudo pip install lxml (默认安装lxml 3.7.3)

 

2. 问题

- 问题1

代码:
soup = BeautifulSoup(html, 'lxml')
报错:
Traceback (most recent call last):
File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
soup = BeautifulSoup(html) #soup = BeautifulSoup(html, 'lxml')报错
TypeError: 'module' object is not callable
解决:
from BeautifulSoup import BeautifulSoup

- 问题2

代码:
soup = BeautifulSoup(html, 'lxml')
报错:
Traceback (most recent call last):
File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
soup = BeautifulSoup(html, 'lxml') #soup = BeautifulSoup(html, 'lxml')报错
File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
self._feed(isHTML=isHTML)
File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
SGMLParser.feed(self, markup)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead(0)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 138, in goahead
k = self.parse_starttag(i)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 296, in parse_starttag
self.finish_starttag(tag, attrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 338, in finish_starttag
self.unknown_starttag(tag, attrs)
File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1338, in unknown_starttag
self.endData()
File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1251, in endData
(not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'
解决:
当前BeautifulSoup是v3版,不支持lxml等,需用v4版。

 

转载于:https://www.cnblogs.com/mulisheng/p/6665350.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值