在使用builtwith识别网站所用技术时,如果具体到某个网页,如下图
cmd
pip install --upgrade builtwith
import builtwith
builtwith.parse('http://data.eastmoney.com/zjlx/300409.html')
就会报如下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "F:\TopQuant\zwPython\py35\python35\lib\site-packages\builtwith\__init__.py", line 65, in builtwith
if contains(html, snippet):
File "F:\TopQuant\zwPython\py35\python35\lib\site-packages\builtwith\__init__.py", line 110, in contains
v = v.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 1249: invalid start byte
>>> help(builtwith.parse)
Help on function builtwith in module builtwith:
builtwith(url, headers=None, html=None, user_agent='builtwith')
Detect the technology used to build a website
>>> builtwith('http://wordpress.com')
{u'blogs': [u'PHP', u'WordPress'], u'font-scripts': [u'Google Font API'], u'web-servers': [u'Nginx'], u'javascript-frameworks': [u'Modernizr'], u'programming-languages': [u'PHP'], u'cms': [u'WordPress']}
>>> builtwith('http://webscraping.com')
{u'javascript-frameworks': [u'jQuery', u'Modernizr'], u'web-frameworks': [u'Twitter Bootstrap'], u'web-servers': [u'Nginx']}
>>> builtwith('http://microsoft.com')
{u'javascript-frameworks': [u'jQuery'], u'mobile-frameworks': [u'jQuery Mobile'], u'operating-systems': [u'Windows Server'], u'web-servers': [u'IIS']}
>>> builtwith('http://jquery.com')
{u'cdn': [u'CloudFlare'], u'web-servers': [u'Nginx'], u'javascript-frameworks': [u'jQuery', u'Modernizr'], u'programming-languages': [u'PHP'], u'cms': [u'WordPress'], u'blogs': [u'PHP', u'WordPress']}
>>> builtwith('http://joomla.org')
{u'font-scripts': [u'Google Font API'], u'miscellaneous': [u'Gravatar'], u'web-servers': [u'LiteSpeed'], u'javascript-frameworks': [u'jQuery'], u'programming-languages': [u'PHP'], u'web-frameworks': [u'Twitter Bootstrap'], u'cms': [u'Joomla'], u'video-players': [u'YouTube']}
通过帮助,我们看到例子中builtwith.parse的参数网址并没有具体某个网页,而是在整个网站级别的网址。因为我们的目的就是识别整个网站所使用的技术,所以,将以上网址替换成其所在网站的总网址即可。如下,
>>> builtwith.parse('http://eastmoney.com')
{'analytics': ['TrackJs'], 'web-servers': ['Nginx'], 'javascript-frameworks': ['jQuery', 'RightJS']}