哪些 Python 库让你相见恨晚？

Python程序员小泉

已于 2022-12-28 11:47:32 修改

阅读量86

点赞数

分类专栏：编程 python 文章标签： python 编程语言 Python第三方库

于 2022-11-08 14:28:02 首次发布

本文链接：https://blog.csdn.net/m0_59162248/article/details/127749440

版权

python 同时被 2 个专栏收录

753 篇文章 61 订阅

订阅专栏

编程

346 篇文章 4 订阅

订阅专栏

下面就给大家推荐几个我用过的，并且特别好用的项目，而不是简单的贴一下awesome python。相信很多人看完awesome python以后，只是简单的收藏一下，并没有很多帮助。

1. yagmail

Python官网上发邮件的例子

(Examples - Python 2.7.13 documentation)

大家感受一下。反正我看到这一堆的import就已经被吓退场了。

#!/usr/bin/env python

"""Send the contents of a directory as a MIME message."""

import os
import sys
import smtplib
# For guessing MIME type based on file name extension
import mimetypes

from optparse import OptionParser

from email import encoders
from email.message import Message
from email.mime.audio import MIMEAudio
from email.mime.base import MIMEBase
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

如果使用yagmail，发送一个带附件的邮件，只需要2行代码：import yagmail

import yagmail
yag = yagmail.SMTP(user='joy_lmx@163.com', password='nicai?', host='smtp.163.com', port='25')
yag.send(user, subject = "I now can send an attachment", attachments=['a.txt', 'b.jpg'])

2. requests

requests很多人都推荐过了，不过可能一些同学感受不到requests到底好在哪里。我们就以官网的例子为例简单说明，在没有request之前，如果我们要请求

https://api.github.com/user，

需要像下面这样：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import urllib2

gh_url = 'https://api.github.com'

req = urllib2.Request(gh_url)

password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, gh_url, 'user', 'pass')

auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)

urllib2.install_opener(opener)

handler = urllib2.urlopen(req)

print handler.getcode()
print handler.headers.getheader('content-type')

# ------
# 200
# 'application/json'

用requests以后，做同样的事情，我们可以这样（注意，前3行代码等于上面一整段代码）：

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

3. psutil

psutil是用来获取操作系统监控以及进程管理的，如果你正在写一个监控系统（或脚本），赶紧去试试。这么说吧，我曾经使用psutil把网易内部的一个监控模块，从1000+行重构到了100+行。

我这里推荐的几个库，可能yagmail对最多人有用。而psutil，对专业的人士最有用。如果你要写一个监控系统，不使用psutil的话，只能直接去/proc目录下读取想用的文件进行计算，或者执行iostat、vmstat、df等linux命令获取命令输出，不管哪一种方法，都要处理很多繁琐的细节。有了psutil以后，就轻松多了。贴段代码大家感受一下：

def get_network_info(self):
"""
    psutil.net_io_counters()
    snetio(bytes_sent=12541464, bytes_recv=21459989, packets_sent=80164, packets_recv=88134, errin=0, errout=0,
     dropin=0, dropout=0)
    """
return psutil.net_io_counters()

def get_memory_used(self):
"""
    psutil.virtual_memory()
    svmem(total=4159041536, available=3723980800, percent=10.5, used=1599082496,
     free=2559959040, active=587403264, inactive=897105920, buffers=95989760, cached=1068032000)
    """
memory_info = psutil.virtual_memory()
memory_used = ( memory_info.total * memory_info.percent / 100 ) / 1024 / 1024
return memory_used

此外，使用越来越广泛的监控工具glances（如果没用过，要不现在就是试试？），就是用psutil收集相关数据的。

4. BeautifulSoup

如果你写爬虫，还在用XPath解析HTML，那赶紧用用BeautifulSoup，比XPath好用一百倍；如果你还在用正则表达式从HTML中获取内容，BeautifulSoup能让你好用到哭。（补充：评论里大家都说XPath更好用，难道是我思维方式和大家不一样？）

BeautifulSoup是用来解析HTML的，特点就是好用，有人吐槽BeautifulSoup慢？我不在乎BeautifulSoup比XPath慢多少，我只知道，我的时间比机器的更宝贵。

例如，要找到页面中所有的links，如下所示：

from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"))
for link in soup.find_all('a'):
print(link.get('href'))

例如，我在编写知乎的爬虫的时候，对于每一个用户的”关注”页面，对于每一个关注对象，有如下的tag：<div

 class="zm-profile-card zm-profile-section-item zg-clear no-hovercard">
    .......
<a title="天雨白" data-hovercard="p$t$tian-yu-bai" class="zm-item-link-avatar" href="/people/tian-yu-bai">
</a>
    .......
</div>

所以，解析单个关注的用户代码如下所示：

soup = BeautifulSoup(text)
#通过属性找到这个div，对于每个用户，对应于这样一个div
items = soup.find_all('div', class_="zm-profile-card zm-profile-section-item zg-clear no-hovercard")
for item in items:
# 获取这个div下的<a>标签的title属性
name = item.a.attrs['title']
# 获取这个div下的<a>标签下的<img>标签里面的src属性
avatar  = item.a.img.attrs['src']

有了BeautifulSoup以后，爬虫操作就变得特别简单了。脏活累活别人都帮忙做好了。