嗨,我正在阅读“使用
Python进行Web Scraping(2015)”.我看到了以下两种打开url的方法,使用和不使用.read().请参阅bs1和bs2
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html')
bs1 = BeautifulSoup(html.read(), 'html.parser')
html = urlopen('http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html')
bs2 = BeautifulSoup(html, 'html.parser')
bs1 == bs2 # true
print(bs1.prettify()[0:100])
print(bs2.prettify()[0:100]) # prints same thing
那么.read()是多余的吗?谢谢
使用python进行Web scpraing的p7代码:(使用.read())
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read())
第15页的代码(没有.read())
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html)