主要是为了复习bs与学习css,发现一段时间不用,bs基本忘了差不多了,主要也是lxml相对好用太多了,且scrapy默认支持xpath与css也是原因之一。上代码:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
"""
@File : test.py.py
@Time : 2019/8/24 13:41
@Author : Sound_of_Silence
"""
import requests
import re
import time
from lxml.html import etree
from bs4 import BeautifulSoup
def get_text(url):
try:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
r = requests.get(url, headers=headers)
r.encoding = 'gb2312'
return r.text
exc