首先下载requests、beautifulsoup4这两个库
import requests
from bs4 import BeautifulSoup
#从网上找了一个豆瓣的网址,可以爬取名字和基本信息
res = requests.get(url="http://www.douban.com/tag/%E5%B0%8F%E8%AF%B4/?focus=book")
soup = BeautifulSoup(res.text,"html.parser")
book_div = soup.find(attrs={"id":"book"})
book_a= book_div.findAll(attrs={"class":"title"})
book_b = book_div.findAll(attrs={"class":"desc"})
for book in book_a:
print(book.string)
- 首先找到id=book,从而首先find一块区域,从中爬取数据
- 然后book_a从这块区域中find_all所有class=title的信息
- 输出的时候以.string把里边的字符串输出出来