在发布了python爬取知乎盐选文章内容后,没想到居然这么快就要更新新的内容了。
在下午思考第一篇python爬取知乎盐选文章内容的时候,其实就把自动爬取目录内的其他内容的方法想出来了,但是本来没想这么快更新的,哈哈。
不过思来想去还是发出来吧,毕竟要不哪天就忘了。
from DecryptLogin import login
from bs4 import BeautifulSoup
import re
import base64
lg = login.Login()
_, loginstauts = lg.zhihu(username='', password='', mode='pc')
headers = {
'user-agent': "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
}
url1 = "https://www.zhihu.com/market/paid_column/1178733193687175168/section/1178742737682350080"
url2 = "https://www.zhihu.com/market/paid_column/1178733193687175168/section/1178742849583083520"
# 获取链接
r = loginstauts.get(url1, headers=headers)
wenzi = r.text
soup = BeautifulSoup(wenzi, 'lxml')
lianjie = soup.textarea
lianjie = str(lianjie)
pattern = re.compile('