七、Python常用开源库介绍

最新推荐文章于 2024-05-13 11:37:30 发布

极端~

最新推荐文章于 2024-05-13 11:37:30 发布

阅读量904

点赞数

分类专栏： Python入门笔记

本文链接：https://blog.csdn.net/han_xiao_xiao/article/details/111191452

版权

Python入门笔记专栏收录该内容

9 篇文章 3 订阅

订阅专栏

一、Python的http爬虫库requests

requests(http://docs.python-requests.org)
Requests is an elegant and simple HTTP library for Python, built for human beings.

使用场景

网络爬虫(和BeautifulSoup库配合)
线上API接口测试或者监控

# pip install requests
import requests
# 发送get请求获取URL返回结果
r = requests.get('http://www.baidu.com')
# 发送post请求获取URL的返回结果
r = requests.post(
    'http://xxx.org/post',
    data={'key': 'value'}
    )
    # 查看返回状态码，如果==200代表访问成功
    r.status_code
    # 获取返回的网页内容
    r.text

二、Python访问MySQL模块PyMySQL

MySQLhttps://dev.mysql.com/downloads/mysql/

MySQL是最流行的关系型数据库管理系统，各大中小企业数据存储首选；
特点：

数据组成表格的形式
使用SQL语言查询
支持事务一致性

# pip install PyMySQL
import pymysql
# 创建数据库连接
conn = pymysql.connect(
    host='127.0.0.1',
    user='root',
    password='root',
    db='mydb')
# 创建一个游标对象cursor
cursor = conn.cursor()
# 执行 SQL 查询
cursor.execute("select * from sgrade")
# 获取所有返回数据, fetchall或者fetchone
datas = cursor.fetchall()
for data in datas: print(data)
# 如果是insert、update、delete语句，需要加上这句
conn.commit()
# 关闭数据库连接
conn.close()

三、Python使用xlwt模块生成excel

读写excel的模块

xlrd：读取excel
xlwt：生成excel
整个表格：xlwt.Workbook

单元格，使用row和col定位，均从0开始
表格Sheets：workbook.add_sheet

# pip install xlwt
import xlwt
# 创建一个excel
workbook = xlwt.Workbook(encoding='utf-8')
# 添加一个sheet
worksheet = workbook.add_sheet('学生成绩表')
# 写入一个单元格，参数为(row、col、data)
worksheet.write(0, 0, '学号01')
worksheet.write(0, 1, '成绩01')
worksheet.write(1, 0, '学号02')
worksheet.write(1, 1, '成绩02')
# 保存excel，参数为excel名称
workbook.save("result.xls")

四、Python开发多进程程序

多进程

如果不加改造，程序会串行执行任务，比较慢；使用多核并行处理，充分挖掘单机的计算潜能；

Multiprocessing是Python的多进程模块，可以使用多核并行计算；
threading是Python的多线程模块，只能使用单核执行计算；

单机总处理能力有限，如果需要处理大于几十GB的数据，可以考虑Spark大数据框架

# 标准库模块无需安装
import multiprocessing
def process(d):
    """这里只需要处理单个元素"""
    return d * d
# Pool的参数为需要使用的CPU核数
# 如果不指定则使用全部核数
with multiprocessing.Pool(3) as pool:
    # 第二个参数代表待处理的池子
    # 比如待爬取的URL列表、待处理的文件输入
        results = pool.map(process, [1, 2, 3, 4])
        print(results)

极端~

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
七、Python常用开源库介绍

一、Python的http爬虫库requestsrequests(http://docs.python-requests.org)Requests is an elegant and simple HTTP library for Python, built for human beings.使用场景网络爬虫(和BeautifulSoup库配合)线上API接口测试或者监控# pip install requestsimport requests# 发送get请求获取URL返回结果r =
复制链接

扫一扫