今天下午在写python爬虫时发现了一个有意思的事
先放两段代码和输出
1.
from bs4 import BeautifulSoup
import urllib2
req = urllib2.urlopen("https://www.qidian.com/all?orderId=&style=1&pageSize=20&siteid=1&pubflag=0&hiddenField=0&page=1")
html = req.read()
soup = BeautifulSoup(html,'lxml')
for div in soup.find_all('div',class_="book-img-box"):
for a in div.find_all('a'):
print 'https:',a['href']#重点在这一句
输出:
https: //book.qidian.com/info/1004608738
https: //book.qidian.com/info/1010468795
https: //book.qidian.com/info/1009265821
https: //book.qidian.com/info/1003694333
https: //book.qidian.com/info/1005238666
https: //book.qidian.com/info/1003723851
https: //book.qidian.com/info/1009704712
https: //book.qidian.com/info/1005986994
https: //book.qidian.com/info/1004595892
https: //book.qidian.com/info/1003354631
https: //book.qidian.com/info/1003578885
https: //book.qidian.com/info/1010136878
https: //book.qidian.com/info/1010734492
https: //book.qidian.com/info/1010734486
https: //book.qidian.com/info/1003307568
https: //book.qidian.com/info/1004142144
https: //book.qidian.com/info/1010422436
https: //book.qidian.com/info/1010298084
https: //book.qidian.com/info/3638453
https: //book.qidian.com/info/3676417
第二种输入
from bs4 import BeautifulSoup
import urllib2
req = urllib2.urlopen("https://www.qidian.com/all?orderId=&style=1&pageSize=20&siteid=1&pubflag=0&hiddenField=0&page=1")
html = req.read()
soup = BeautifulSoup(html,'lxml')
for div in soup.find_all('div',class_="book-img-box"):
for a in div.find_all('a'):
print 'https:'+a['href']#重点在这一句
输出:
https://book.qidian.com/info/1004608738
https://book.qidian.com/info/1010468795
https://book.qidian.com/info/1009265821
https://book.qidian.com/info/1003694333
https://book.qidian.com/info/1005238666
https://book.qidian.com/info/1003723851
https://book.qidian.com/info/1009704712
https://book.qidian.com/info/1005986994
https://book.qidian.com/info/1004595892
https://book.qidian.com/info/1003354631
https://book.qidian.com/info/1003578885
https://book.qidian.com/info/1010136878
https://book.qidian.com/info/1010734492
https://book.qidian.com/info/1010734486
https://book.qidian.com/info/1003307568
https://book.qidian.com/info/1004142144
https://book.qidian.com/info/1010422436
https://book.qidian.com/info/1010298084
https://book.qidian.com/info/3638453
https://book.qidian.com/info/3676417
综合看两段输出,是不是发现第一次输出的https://和book…之间存在空格,而第二次输出没有呢?对的,在python2.7中用‘+‘号连接会导致输出无空格间隙,用‘,‘连接会导致输出存在空格
编程体会,说的不好,若有错误请指出,谢谢