Python3 笔记

最新推荐文章于 2024-05-31 21:55:57 发布

weixin_33722405

最新推荐文章于 2024-05-31 21:55:57 发布

阅读量171

点赞数

文章标签：数据库 python

Ubuntu18.04 Python3环境

默认python3已经安装了, 可能是安装其他应用的时候因为依赖关系安装的.

安装pip3, 先sudo apt update 一下, apt-cache search python3-pip 看看有没有, 如果没有的话检查一下/etc/apt/sources.list 是否正确, 可以参考以下的source.list

deb http://cn.archive.ubuntu.com/ubuntu/ bionic main restricted
deb http://cn.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
deb http://cn.archive.ubuntu.com/ubuntu/ bionic universe
deb http://cn.archive.ubuntu.com/ubuntu/ bionic-updates universe
deb http://cn.archive.ubuntu.com/ubuntu/ bionic multiverse
deb http://cn.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
deb http://cn.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu bionic-security main restricted
deb http://security.ubuntu.com/ubuntu bionic-security universe
deb http://security.ubuntu.com/ubuntu bionic-security multiverse

然后通过 sudo apt install python3-pip 安装

常用的通过pip3安装的package: pymongo,

常用语法

合并两个Dictionary

x = {'aa':1, 'bb':2, 'cc':3}
y = {'aa':5, 'xx':6, 'yy':7}
z = {**x, **y}
x.update(y)
print(x)
# 输出
{'aa': 5, 'bb': 2, 'cc': 3, 'xx': 6, 'yy': 7}
{'aa': 5, 'bb': 2, 'cc': 3, 'xx': 6, 'yy': 7}

两种方法都可以实现dictionary合并, 前者不会修改原先两个dictionary的值

逻辑判断

# 空, 非空
if (x is None)
if (not x is None)

# dictionary是否包含某key
if ('name' in x.keys())
if (not 'name' in x.keys())

字符串操作

# substring, 截取从0到100的子串
str[0:100]

常用模块

HTTP请求. Requests模块

使用手册 http://docs.python-requests.org/zh_CN/latest/api.html

使用举例

import requests

def requestGet(url, encoding='UTF-8', tout=20, retries=10):
    count = 0
    while True:
        count += 1
        if (count > retries):
            print('Exceed retry limit')
            return None
        try:
            response = requests.get(url, timeout=tout)
            response.encoding = encoding
            return response.text
        except requests.ReadTimeout:
            print('ReadTimeout')
            continue
        except ConnectionError:
            print('ConnectionError')
            continue
        except requests.RequestException:
            print('RequestException')
            continue

注意: 抓取GB2312网页时, encoding建议使用GB18030, 避免部分特殊文字乱码

Mongodb 操作. Pymongo模块

使用举例

import pymongo

# 连接mongodb
client = pymongo.MongoClient('172.17.0.2', 27017)
# 选择db
db = client.db_1
# 选择collection
tb_user = db['user']
# count
total = tb_user.count_documents({})
# select one
dummy = tb_user.find_one({'_id': name})
# select all, and sort
allusers = tb_user.find().sort('posts', -1)
# insert or save
tb_user.save(user)

Collection级别的操作

# 创建索引
collection_demo.create_index([('field1', pymongo.ASCENDING)])

如果要在python中直接执行mongo 命令, 需要使用 eval(), 例如以下的语句用于将多个同构的collection合并到同一个collection

for board in boards:
    tb_current = rbcommon.db['deb_' + str(board['_id'])]
    if (not tb_current is None):
        current_total = tb_current.find().count()
        print(str(board['_id']) + ', ' + board['name'] + ', ' + board['name2'] + ', ' + str(current_total))
        rbcommon.db.eval('db.deb_'+ str(board['_id']) +'.find({}).forEach(function(u) {db.deb_all.save(u);})')

count_documents()的坑

对于数据量很大的collection, 尽量不要使用这个方法, 因为这个方法实际上需要遍历所有documents, 会非常慢. 可以使用 estimated_document_count() 这个方法, 这个是从collection的metadata中读取的缓存的documents数量, 如果此时collection正在写入, 可能会与实际数量有出入.

YAML配置文件读取

import yaml

# 获取当前文件的路径, 注意: 是common.py路径, 不带slash, 不是引用common的文件的路径, 
# 可以用 os.path.realpath(__file__) 测试
rootPath = os.path.dirname(__file__)
print(rootPath)
configPath = os.path.join(rootPath,'config.yml')
print(configPath)

with open(configPath, 'r') as ymlfile:
    cfg = yaml.load(ymlfile)

# yaml会自动将长得像数字的值, 转换为数字类型. 下面是使用配置参数值的方式
mongoclient = pymongo.MongoClient(cfg['mongo']['host'], cfg['mongo']['port'])
db = mongoclient[cfg['mongo']['db']]
tb_section = db['section']