def get_content(url): # try: resp = requests.get(url, headers=header, timeout=0.5) resp.encoding = 'utf-8' html = resp.text bs = BeautifulSoup(html, "html.parser") # except: # bs = "死链" # print(bs) # a = input("pause") return str(bs)
原本的代码是包括注释的,因为源数据中有很多死链,所以我设置了timeout为0.5。结果大多数百度都爬不到了,最后去了注释成功了