1.环境配置 打开cmd 输入pip install lxml pip install selenium
2.下载属于火狐和谷歌的driver(随自己喜欢我用的是火狐)https://github.com/mozilla/geckodriver/releases
3.下载后解压在桌面
A。代码块 (对搞笑段子的提取,提取30页,并保存起来)
from selenium import webdriver
from lxml import etree,html
etree = html.etree
diver_path = r'/Users/apple/Desktop/diver/geckodriver'
driver = webdriver.Firefox(executable_path=diver_path)
driver.get('http://www.lovehhy.net/Joke/Detail/QSBK/2')
html = driver.page_source
h5 = etree.HTML(html)
url2 = 'http://www.lovehhy.net/Joke/Detail/QSBK/{page}'
with open('1.txt', 'w', encoding='utf-8')as fp:
for i in range(1,30):
url = url2.format(page=i)
driver.get(url)
data = h5.xpath('//*[@id="endtext"]/text()')
data1 = h5.xpath('/html/body/div[4]/div/div[3]/div/div/h3//text()')
da = data+data1
for a,b in zip(data1,data):
k = a + b
print(k)
fp.write(str(k)+'\n')
#fp.write('{}...{}'.format(data[i],data1[i])+'\n')
if __name__ == '__main__':
with open('1.txt', 'r') as f1, open('1.txt', 'a+') as f2:
for line in f1:
f2.writelines(line + '.')
B。效果图(会自己弹出效果图,并自己翻页)
持续更新中。。。。。。。。