python3中使用builtwith的方法(很详细)

1. 首先通过pip install builtwith安装builtwith

C:\Users\Administrator>pip install builtwith  
Collecting builtwith  
  Downloading builtwith-1.3.2.tar.gz  
Installing collected packages: builtwith  
  Running setup.py install for builtwith ... done  
Successfully installed builtwith-1.3.2  

2. 在pycharm中新建工程并输入下面测试代码

import builtwith  
tech_used = builtwith.parse('http://www.baidu.com')  
print(tech_used)  

运行会得到下面的错误:

C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe F:/python/first/FirstPy  
Traceback (most recent call last):  
  File "F:/python/first/FirstPy", line 1, in <module>  
    import builtwith  
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\builtwith\__init__.py", line 43  
    except Exception, e:  
                    ^  
SyntaxError: invalid syntax  
  
  
Process finished with exit code 1  

原因是builtwith是基于2.x版本的,需要修改几个地方,在pycharm出错信息中双击出错文件,进行修改,主要修改下面三种:
1. Python2中的 “Exception ,e”的写法已经不支持,需要修改为“Exception as e”。
2. Python2中print后的表达式在Python3中都需要用括号括起来。
3. builtwith中使用的是Python2中的urllib2工具包,这个工具包在Python3中是不存在的,需要修改urllib2相关的代码。
1和2容易修改,下面主要针对第3点进行修改:
首先将import urllib2替换为下面的代码:

 
import urllib.request  
import urllib.error  

然后将urllib2的相关方法替换如下:

request = urllib.request.Request(url, None, {'User-Agent': user_agent})  
response = urllib.request.urlopen(request)  

再次运行项目,遇到下面错误:

C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe F:/python/first/FirstPy  
Traceback (most recent call last):  
  File "F:/python/first/FirstPy", line 3, in <module>  
    builtwith.parse('http://www.baidu.com')  
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\builtwith\__init__.py", line 62, 
in builtwith  
    if contains(html, snippet):  
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\builtwith\__init__.py", line 105, 
in contains  
    return re.compile(regex.split('\\;')[0], flags=re.IGNORECASE).search(v)  
TypeError: cannot use a string pattern on a bytes-like object  
  
  
Process finished with exit code 1  

这是因为urllib返回的数据格式已经发生了改变,需要进行转码,将下面的代码:

if html is None:  
    html = response.read()  

修改为

if html is None:  
     html = response.read()  
     html = html.decode('utf-8')  

再次运行得到最终结果如下:

C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe F:/python/first/FirstPy  
{'javascript-frameworks': ['jQuery']}  
  
  
Process finished with exit code 0  

但是如果把网站换成 'www.163.com',运行再次报错如下:

C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe F:/python/first/FirstPy  
Error: 'utf-8' codec can't decode byte 0xcd in position 500: invalid continuation byte  
Traceback (most recent call last):  
  File "F:/python/first/FirstPy", line 2, in <module>  
    tech_used = builtwith.parse('http://www.163.com')  
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\builtwith\__init__.py", line 63, 
in builtwith  
    if contains(html, snippet):  
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36\lib\site-packages\builtwith\__init__.py", line 106, 
in contains  
    return re.compile(regex.split('\\;')[0], flags=re.IGNORECASE).search(v)  
TypeError: cannot use a string pattern on a bytes-like object  
  
  
  
Process finished with exit code 1  

似乎还是编码的问题,将编码设置成 ‘GBK’,运行成功如下:

C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe F:/python/first/FirstPy  
{'web-servers': ['Nginx']}  
  
  
Process finished with exit code 0  

所以不同的网站需要用不同的解码方式么?下面介绍一种判别网站编码格式的方法。
我们需要安装一个叫chardet的工具包,如下:

C:\Users\Administrator>pip install chardet  
Collecting chardet  
  Downloading chardet-2.3.0-py2.py3-none-any.whl (180kB)  
    100% |████████████████████████████████| 184kB 616kB/s  
Installing collected packages: chardet  
Successfully installed chardet-2.3.0  
  
  
C:\Users\Administrator>  

将byte数据传入chardet的detect方法后会得到一个Dict,里面有两个值,一个是置信值,一个是编码方式

{'encoding': 'utf-8', 'confidence': 0.99}  

将builtwith对应的代码做下面修改:

encode_type = chardet.detect(html)  
  if encode_type['encoding'] == 'utf-8':  
    html = html.decode('utf-8')  
  else:  
    html = html.decode('gbk')  

记得 import chardet!!!!
加入chardet判断字符编码的方式后,就能适配网站了~~~~

 http://blog.csdn.net/fengzhizi76506/article/details/61617067
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
In Python, the `with` statement is used to wrap the execution of a block of code with methods defined by a context manager. Context managers are objects that define the methods `__enter__()` and `__exit__()` which are invoked when the object is created and destroyed, respectively. The most common use of the `with` statement is for working with files. When we open a file using the built-in `open()` function, we need to make sure that we close the file after we've finished working with it. The `with` statement provides a convenient way to do this automatically: ``` with open('file.txt', 'r') as f: contents = f.read() print(contents) ``` In this example, the `open()` function returns a file object which is passed to the `with` statement. The `with` statement calls the file object's `__enter__()` method, which opens the file. The block of code inside the `with` statement can then read the file contents. Once the block of code is finished, the `with` statement calls the file object's `__exit__()` method, which closes the file. The `with` statement can also be used with other types of context managers. For example, the `sqlite3` module provides a context manager for working with SQLite databases: ``` import sqlite3 with sqlite3.connect('mydatabase.db') as conn: cursor = conn.cursor() cursor.execute('SELECT * FROM users') rows = cursor.fetchall() for row in rows: print(row) ``` In this example, the `sqlite3.connect()` function returns a connection object which is passed to the `with` statement. The `with` statement calls the connection object's `__enter__()` method, which opens a connection to the database. The block of code inside the `with` statement can then execute SQL queries using a cursor object. Once the block of code is finished, the `with` statement calls the connection object's `__exit__()` method, which closes the connection to the database.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值