User-Aagent String这个网站收集了爬虫、浏览器、主机、邮件客户端等客户端的请求头的UA字段。其中浏览器的UA头有将近10000个。但是访问这个网站很慢。
http://useragentstring.com/pages/useragentstring.php
把浏览器的所有UA头下载下来,字符串长度小于80的丢弃,存成csv文件,得到6244条:
# -*- coding: utf-8 -*-
import requests
import pandas as pd
from lxml import etree
url = 'http://useragentstring.com/pages/useragentstring.php?typ=Browser'
header = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent': 'Mozilla/5.0 (compatible; ABrowse 0.4; Syllable)'
}
response = requests.get(url,
headers=header,
timeout=60