1.目标:爬取北京所有的小学名单
链接:http://beijing.xuexiaodaquan.com/xiaoxue/pn30.html
分析:
代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
for i in range(30):
html = urlopen('http://beijing.xuexiaodaquan.com/xiaoxue/pn'+str(2)+'.html')
bsObj = BeautifulSoup(html, 'lxml')
nameList = bsObj.findAll('div', {'class': 'list-xx clearfix'})
for name in nameList:
first=name.select('a')
for i in range(len(first)):#表示从0到xml的len()长度
if i%2!=0:
print(first[i].string)
效果: