python 将字典内容存入数据库_Python 模块的介绍

最新推荐文章于 2024-08-09 04:11:34 发布

mizore

最新推荐文章于 2024-08-09 04:11:34 发布

阅读量873

点赞数

文章标签： python 将字典内容存入数据库 python模块

为什么会有模块

Python不仅有自带的模块（称之为标准库），还有海量的第三方模块，并且很多开发者还在不断贡献自己开发的新模块，正是有了这么强大的“模块自信”，Python才被很多人钟爱。并且这种方式也正在不断被其他更多语言所借鉴，几乎成为普世行为了

初识模块

sys 模块

import math

math.pow(3,2)

这里的math（是Python标准库之一）就是一个模块，用import引入这个模块，然后可以使用模块里面的函数，比如pow()函数。显然，这里是不需要自己动手写具体函数的，我们的任务就是拿过来使用。这就是模块的好处：拿过来就用，不用自己重写

模块的定义

“模块是程序”一语道破了模块的本质，它就是一个扩展名为.py的Python程序
我们能够在应该使用它的时候将它引用过来，节省精力，不需要重写雷同的代码

包和库以及模块的概念和关系

顾名思义，包和库都是比“模块”大的。一般来讲，一个“包”里面会有多个模块，当然，“库”是一个更大的概念了，比如Python标准库中的每个库都有好多个包，每个包都有若干个模块。一个包由多个模块组成，即有多个.py的文件，那这个所谓的“包”其实就是我们熟悉的一个目录罢了。

如何解决引用某个目录中的模块问题？

解决方法就是在该目录中放一个 __init__.py文件。 __init__.py是一个空文件，将它放在某个目录中，就可以将该目录中的其他.py文件作为模块被引用

例如，建立一个目录，名曰：package_test，里面依次放了pm.py和pp.py两个文件，然后建立一个空文件__init__.py,接下来，需要导入这个包（package_test）中的模块

>>> import package_qi.pm

>>> package_qi.pm.lang()

'python'

或者

>>> from package_qi import pm

>>> pm.lang()

'python'

其实总结一下，导入模块的几种方法

import module
from module.xx.xx import xx
from module.xx.xx import xx as rename 
from module.xx.xx import *

导入模块其实就是告诉Python解释器去解释那个py文件

导入一个py文件，解释器解释该py文件
导入一个包，解释器解释该包下的 init.py 文件【py2.7】

那么问题来了，导入模块时是根据那个路径作为基准来进行的呢？即：sys.path

import sys

print(sys.path)

模块的分类

自定义模块
内置模块
第三方模块

自定义模块

其实所谓的自定义模块，就是自己创建一个.py文件，然后导入到你想使用的地方即可

那如果sys.path路径列表没你想要的路径，可以通过 sys.path.append('路径') 添加

import sys
import os
project_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(project_path)

内置模块

1. sys

用于对python解释器系统级别的模块

sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdin          输入相关
sys.stdout         输出相关
sys.stderror       错误相关

进度百分比

import sys,time

# 进度条的演示案例
for i in range(20):
    sys.stdout.write("#")
    sys.stdout.flush()
    time.sleep(0.2)

2. os

os.getcwd()                 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")         改变当前脚本工作目录；相当于shell下cd
os.curdir                   返回当前目录: ('.')
os.pardir                   获取当前目录的父目录字符串名：('..')
os.makedirs('dir1/dir2')    可生成多层递归目录
os.removedirs('dirname1')   若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')         生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')         删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')       列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove()                 删除一个文件
os.rename("oldname","new")  重命名文件/目录
os.stat('path/filename')    获取文件/目录信息
os.sep                      操作系统特定的路径分隔符，win下为"",Linux下为"/"
os.linesep                  当前平台使用的行终止符，win下为"tn",Linux下为"n"
os.pathsep                  用于分割文件路径的字符串
os.name                     字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")   运行shell命令，直接显示
os.environ                  获取系统环境变量
os.path.abspath(path)       返回path规范化的绝对路径
os.path.split(path)         将path分割成目录和文件名二元组返回
os.path.dirname(path)       返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)      返回path最后的文件名。如何path以／或结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)        如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)         如果path是绝对路径，返回True
os.path.isfile(path)        如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)         如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)      返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)      返回path所指向的文件或者目录的最后修改时间

3. random

import random
print random.random()
print random.randint(1,2)
print random.randrange(1,10)

生成随机验证码

import random
checkcode = ''
for i in range(4):
    current = random.randrange(0,4)
    if current != i:
        temp = chr(random.randint(65,90))
    else:
        temp = random.randint(0,9)
    checkcode += str(temp)
print checkcode

4. 序列化

现有如下的需求，需要将下面的字典存入到磁盘中，怎么办？

info = {
    "name":'zhangsan',
    "age":25
}

将这个字典存入到文件中不就完了

于是，我们打开一个文件，将上面的字典存入到一个文件中

于是，有了如下的代码，但是我们发现一个问题：

f = open('data.json',"w")
f.write(str(info))
f.close()

存入到文件中的是一个字符串，我们采用下面的方法取出数据：

f = open("data.json","r")
data = eval( f.read() )
print(data)
f.close()

这个时候我们再去取name和age的时候，很方便取出来了

但是有一个问题，如果我们这个时候用python写了一个接口给php使用，接口里面的内容就是使用上述方法将字典保存到文件中，那此时php想获取值得时候，发现没有eval这个函数，那php怎么去取？

因此，上面的思路已经不能满足我们的需求了，那该怎么办？

我们采用json模块来进行存储

因此，我们有如下的代码：

# 此处调用的是json的dumps方法
data = json.dumps(info)

with open("data2.json","w") as f:
    f.write(data)

这个时候，我们存入的就是一个标准的json格式

取出的时候，我们用如下的代码：

with open("data2.json","r") as f:
    # 采用loads方法
    data = json.loads(f.read())
    print(data)
    print(data['name'])

除了上述的dumps和loads方法之外，还有两个方法：

dump 和 load

dump的用法：

f = open("data3.json","w")

data = json.dump(info,f)
print(data)

load的用法：

f = open('data3.json',"r")
data = json.load(f)
print(data['age'])

上述的过程就叫做

序列化-----即把一个内存对象转换成字符串的过程反序列化---将一个字符串转换成对应内存对象的过程

为啥会有上述的操作？

持久化内存对象有的时候我们需要保存我们可能要中断的执行任务需要恢复上次执行的环境场景所以我们需要把一个内存对象持久化存入到磁盘中，因为内存对象不容易存到数据库例子：打游戏时需要恢复场景，虚拟机挂起等

在python中，除了json可以进行序列化和返序列化之外，还有一个特定的模块pickle，也可以进行序列化和反序列化

我们需要注意的是，json和pickle最终的结果都是一样的，那还要pickle干啥？

pickle的存在是用来序列化和反序列化一些特有的类型，比如函数

pickle和json的区别？

1.pickle能够序列化python中的所有数据类型，而json只能够序列化基本的数据类型 2.pickle是python独有的，只能是用于python中，不能用于和其他语言交换数据

Python中用于序列化的两个模块

json 用于【字符串】和【python基本数据类型】间进行转换
pickle 用于【python特有的类型】和【python基本数据类型】间进行转换

Json模块提供了四个功能：dumps、dump、loads、load pickle模块提供了四个功能：dumps、dump、loads、load

5. logging模块

主要用来记录程序的各种错误和运行信息，方便后续调试

记录单文件日志

import logging

logging.basicConfig(filename='test.log',
                    format='%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S %p',
                    level=10)

logging.debug('debug')
logging.info('info')
logging.warning('warning')
logging.error('error')
logging.critical('critical')
logging.log(10,'log')

日志等级

CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0

注：只有【当前写等级】大于【日志等级】时，日志文件才被记录

日志格式化

多文件日志

对于上述记录日志的功能，只能将日志记录在单文件中，如果想要设置多个日志文件，logging.basicConfig将无法完成，需要自定义文件和日志操作对象。

# 定义文件
file_1_1 = logging.FileHandler('l1_1.log', 'a', encoding='utf-8')
fmt = logging.Formatter(fmt="%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s")
file_1_1.setFormatter(fmt)

file_1_2 = logging.FileHandler('l1_2.log', 'a', encoding='utf-8')
fmt = logging.Formatter()
file_1_2.setFormatter(fmt)

# 定义日志
logger1 = logging.Logger('s1', level=logging.ERROR)
logger1.addHandler(file_1_1)
logger1.addHandler(file_1_2)


# 写日志
logger1.critical('1111')
# 定义文件
file_2_1 = logging.FileHandler('l2_1.log', 'a')
fmt = logging.Formatter()
file_2_1.setFormatter(fmt)

# 定义日志
logger2 = logging.Logger('s2', level=logging.INFO)
logger2.addHandler(file_2_1)

# 写日志
logger1.critical('2222')

如上述创建的两个日志对象

当使用【logger1】写日志时，会将相应的内容写入 l1_1.log 和 l1_2.log
当使用【logger2】写日志时，会将相应的内容写入 l2_1.log

6. time

时间相关的操作，时间有三种表示方式：
时间戳 1970年1月1日之后的秒，即：time.time() 格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d') 结构化时间元组包含了：年、日、星期等 time.struct_time 即：time.localtime()

常见的操作

import time

print(time.altzone)  #返回与utc时间的时间差,以秒计算
print(time.asctime()) #返回时间格式"Fri Aug 19 11:14:16 2016",
print(time.localtime()) #返回本地时间 的struct time对象格式
print(time.gmtime(time.time()-800000)) #返回utc时间的struc时间对象格式

print(time.asctime(time.localtime())) #返回时间格式"Fri Aug 19 11:14:16 2016",
print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上


# 日期字符串 转成  时间戳
string_to_struct = time.strptime("2016/05/22","%Y/%m/%d") #将 日期字符串 转成 struct时间对象格式
print(string_to_struct)
#
struct_to_stamp = time.mktime(string_to_struct) #将struct时间对象转成时间戳
print(struct_to_stamp)


#时间戳 转为 日期字符串
print(time.gmtime(time.time()-86640)) #将utc时间戳转换成struct_time格式
print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将utc struct_time格式转成指定的字符串格式


#时间加减
import datetime

print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925
print(datetime.date.fromtimestamp(time.time()) )  # 时间戳直接转成日期格式 2016-08-19
print(datetime.datetime.now() )
print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天
print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分

占位符号

%Y  Year with century as a decimal number.
%m  Month as a decimal number [01,12].
%d  Day of the month as a decimal number [01,31].
%H  Hour (24-hour clock) as a decimal number [00,23].
%M  Minute as a decimal number [00,59].
%S  Second as a decimal number [00,61].
%z  Time zone offset from UTC.
%a  Locale's abbreviated weekday name.
%A  Locale's full weekday name.
%b  Locale's abbreviated month name.
%B  Locale's full month name.
%c  Locale's appropriate date and time representation.
%I  Hour (12-hour clock) as a decimal number [01,12].
%p  Locale's equivalent of either AM or PM.

7. 加密hashlib

用于加密相关的操作，代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

# ######## md5 ########
hash = hashlib.md5()
# help(hash.update)
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())
print(hash.digest())


######## sha1 ########

hash = hashlib.sha1()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())

# ######## sha256 ########

hash = hashlib.sha256()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())


# ######## sha384 ########

hash = hashlib.sha384()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())

# ######## sha512 ########

hash = hashlib.sha512()
hash.update(bytes('admin', encoding='utf-8'))
print(hash.hexdigest())

以上加密算法虽然依然非常厉害，但时候存在缺陷，即：通过撞库可以反解。所以，有必要对加密算法中添加自定义key再来做加密。

import hashlib

# ######## md5 ########

hash = hashlib.md5(bytes('898oaFs09f',encoding="utf-8"))
hash.update(bytes('admin',encoding="utf-8"))
print(hash.hexdigest())

python内置还有一个 hmac 模块，它内部对我们创建 key 和内容进行进一步的处理然后再加密

import hmac

h = hmac.new(bytes('898oaFs09f',encoding="utf-8"))
h.update(bytes('admin',encoding="utf-8"))
print(h.hexdigest())

8. configparser

如下是一个常见的软件格式的配置文件

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no

1）创建配置文件

import configparser

config = configparser.ConfigParser()

# 第一种方式
config["DEFAULT"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9'}

# 第二种方式 
config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'

# 第三种方式
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'  
topsecret['ForwardX11'] = 'no' 

with open('example.ini', 'w') as configfile:
   config.write(configfile)

2）读取所有节点

>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.sections()
[]
>>> config.read('example.ini', encoding='utf-8')
['example.ini']
>>> config.sections()
['bitbucket.org', 'topsecret.server.com']

3）获取指定节点下的所有键值对 ----- items

import configparser

config = configparser.ConfigParser()
config.read('example.ini', encoding='utf-8')
ret = config.items('DEFAULT')
print(ret)

4）获取指定节点下所有的键 ----- options

import configparser

config = configparser.ConfigParser()
config.read('example.ini', encoding='utf-8')
ret = config.options('DEFAULT')
print(ret)

5) 获取指定节点下指定key的值 ----- get

import configparser

config = configparser.ConfigParser()
config.read('example.ini', encoding='utf-8')


v = config.get('DEFAULT', 'k1')
# v = config.getint('DEFAULT', 'k1')
# v = config.getfloat('DEFAULT', 'k1')
# v = config.getboolean('DEFAULT', 'k1')

6) 读取配置文件详细数据的另一种方式

>>> import configparser
>>> config = configparser.ConfigParser()
>>> config['bitbucket.org']['User']
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: 
            print(key)
...
user
compressionlevel
serveraliveinterval
compression
forwardx11

7) 检查、删除、添加节点

import configparser

config = configparser.ConfigParser()
config.read('example.ini', encoding='utf-8')

# 检查
has_sec = config.has_section('DEFAULT')
print(has_sec)

# 添加节点
config.add_section("SEC_1")
config.write(open('example.ini', 'w'))

# 删除节点
config.remove_section("SEC_1")
config.write(open('example.ini', 'w'))

8) 检查、删除、设置指定组内的键值对

import configparser

config = configparser.ConfigParser()
config.read('example.ini', encoding='utf-8')

# 检查
has_opt = config.has_option('section1', 'k1')
print(has_opt)

# 删除
config.remove_option('section1', 'k1')
config.write(open('example.ini', 'w'))

# 设置
config.set('section1', 'k10', "123")
config.write(open('example.ini', 'w'))

9. xml

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，long long ago...，在json还没诞生的时候，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml，并且大部分java程序中的配置文件多数用的就是xml格式

如下，就是一个简单的xml格式的配置文件

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

1). 查询xml文档中的数据

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)

# 遍历xml文档
for child in root:
    print(child.tag, child.attrib)
    for i in child:
        print(i.tag,i.text)

# 只遍历year节点
for node in root.iter('year'):
    print(node.tag,node.text)

2). 修改和删除xml文档内容

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")
root = tree.getroot()

# 修改
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("updated","yes")

# 将修改后的内容写入到文件中
tree.write("xmltest.xml")

#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)

tree.write('output.xml')

3). 创建xml文档

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml, "name", attrib = {"enrolled":"yes"})
age = ET.SubElement(name, "age", attrib={"checked":"no"})
sex = ET.SubElement(name, "sex")
sex.text = '33'

name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'

et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)

ET.dump(new_xml) #打印生成的格式

10. urllib 和 requests

urllib Python标准库中提供了：urllib等模块以供Http请求，但是，它的 API 太渣了。它需要巨量的工作，甚至包括各种方法覆盖，来完成最简单的任务，下面是简单的使用urllib来进行请求数据的方法

import urllib.request

f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = f.read().decode('utf-8')
import urllib.request

req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib.request.urlopen(req)

result = f.read().decode('utf-8')

注：更多见Python官方文档：https://docs.python.org/3.5/library/urllib.request.html#module-urllib.request

requests

Requests 是使用 Apache2 Licensed 许可证的基于Python开发的HTTP 库，其在Python内置模块的基础上进行了高度的封装，从而使得进行网络请求时，变得美好了许多，因为使用Requests可以轻而易举的完成浏览器可有的任何操作

1）安装模块

pip3 install requests

2) 使用模块 - get请求

# 1、无参数实例

import requests
ret = requests.get('https://github.com/timeline.json')

print(ret.url)
print(ret.text)


# 2、有参数实例
import requests

payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.get("http://httpbin.org/get", params=payload)

print(ret.url)
print(ret.text)

post请求

# 1、基本POST实例

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)

print(ret.text)


# 2、发送请求头和数据实例

import requests
import json

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

ret = requests.post(url, data=json.dumps(payload), headers=headers)

print(ret.text)
print(ret.cookies)

其他请求

requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)

# 以上方法均是在此方法的基础上构建
requests.request(method, url, **kwargs)

常见实例

1) 检测QQ是否在线

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用内置模块urllib发送HTTP请求，或者XML格式内容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = f.read().decode('utf-8')
"""


# 使用第三方模块requests发送HTTP请求，或者XML格式内容
r = requests.get('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = r.text

# 解析XML格式内容
node = ET.XML(result)

# 获取内容
if node.text == "Y":
    print("在线")
else:
    print("离线")

2) 查看火车停靠信息

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用内置模块urllib发送HTTP请求，或者XML格式内容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = f.read().decode('utf-8')
"""

# 使用第三方模块requests发送HTTP请求，或者XML格式内容
r = requests.get('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = r.text

# 解析XML格式内容
root = ET.XML(result)
for node in root.iter('TrainDetailInfo'):
    print(node.find('TrainStation').text,node.find('StartTime').text,node.tag,node.attrib)

除了上述的例子外，我们也还可以使用requests请求网页，然后结合python给提供的正则表达式对抓过来的网页上的数据进行分析，这就是日后我们会讲的爬虫

11. shutil

高级的文件、文件夹、压缩包处理模块

1). shutil.copyfileobj(fsrc, fdst[, length])

将文件内容拷贝到另一个文件中

import shutil

shutil.copyfileobj(open('old.xml','r'), open('new.xml', 'w'))

2). shutil.copyfile(src, dst)

拷贝文件

shutil.copyfile('f1.log', 'f2.log')

3). shutil.copymode(src, dst)

仅拷贝权限。内容、组、用户均不变

shutil.copymode('f1.log', 'f2.log')

4). shutil.copystat(src, dst)

仅拷贝状态的信息，包括：mode bits, atime, mtime, flags

shutil.copystat('f1.log', 'f2.log')

5). shutil.copy(src, dst)

拷贝文件和权限

import shutil

shutil.copy('f1.log', 'f2.log')

6). shutil.copy2(src, dst)

拷贝文件和状态信息

import shutil

shutil.copy2('f1.log', 'f2.log')

7). shutil.rmtree(path[, ignore_errors[, onerror]])

递归的去删除文件

import shutil

shutil.rmtree('folder1')

8). shutil.move(src, dst)

递归的去移动文件，它类似mv命令，其实就是重命名。

import shutil

shutil.move('folder1', 'folder3')

9). shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径
如：www =>保存至当前路径
如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/
format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”
root_dir：要压缩的文件夹路径（默认当前目录）
owner：用户，默认当前用户
group：组，默认当前组
logger：用于记录日志，通常是logging.Logger对象

#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
import shutil
ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()
import tarfile

# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.log', arcname='bbs2.log')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.log', arcname='cmdb.log')
tar.close()

# 解压
tar = tarfile.open('your.tar','r')
tar.extractall()  # 可设置解压地址
tar.close()

12. subprocess

这个模块主要是用来替换os.system()以及os.spawn()
subprocess包主要功能是执行外部的命令和程序。比如说，我需要使用wget下载文件。我在Python中调用wget程序。从这个意义上来说，subprocess的功能与shell类似

常见的subprocess的方法

import subprocess

#执行命令，如果命令结果为0，就正常返回，否则抛异常
>>> res = subprocess.call(["ls", "-l"])

#执行命令，如果命令结果为0，就正常返回，否则抛异常
>>> res = subprocess.check_call(["ls", "-l"])

#执行命令，并返回结果，注意是返回结果，不是打印，下例结果返回给res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0ndrwxr-xr-x 12 shangzekai staff 402 Nov 3 12:25 dd.txtn'

#通过一个shell来解释一整个字符串
out = subprocess.call("ls -l", shell=True)
out = subprocess.call("cd ..", shell=True)

#我们使用了shell=True这个参数。这个时候，我们使用一整个字符串，而不是一个表来运行。Python将先运行一个shell，再用这个shell来解释这整个字符串

以上的那些方法，我们底层调用的都是Popen() 方法

>>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
>>> p.stdout.read()
b'/dev/disk1 465Gi 64Gi 400Gi 14% 16901472 104938142 14% /n'

stdin 标准输入
stdout 标准输出
stderr 标准错误

除了上述的一些方法之外，强烈推荐调用另一种方法，subprocess.run(...)

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} ;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())

可用参数

args：shell命令，可以是字符串或者序列类型（如：list，元组）
bufsize：指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄
preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用
close_sfs：在windows平台下，如果close_fds被设置为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道。
所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
shell：同上
cwd：用于设置子进程的当前目录
env：用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。
universal_newlines：不同系统的换行符不同，True -> 同意使用 n
startupinfo与createionflags只在windows下有效
将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等

subprocess实现sudo 自动输入密码

import subprocess

def mypass():
    mypass = '123' #or get the password from anywhere
    return mypass

echo = subprocess.Popen(['echo',mypass()],
                        stdout=subprocess.PIPE,
                        )

sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
                        stdin=echo.stdout,
                        stdout=subprocess.PIPE,
                        )

end_of_pipe = sudo.stdout

print("Password ok n Iptables Chains %s" % end_of_pipe.read())

13. 正则表达式

常见的正则表达式符号

'.'     默认匹配除n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","nabcneee",flags=re.MULTILINE)
'$'     匹配字符结尾，或e.search("foo$","bfoonsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c


'A'    只从字符开头匹配，re.search("Aabc","alexabc") 是匹配不到的
'Z'    匹配字符结尾，同$
'd'    匹配数字0-9
'D'    匹配非数字
'w'    匹配[A-Za-z0-9]
'W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、t、n、r , re.search("s+","abtc1n3").group() 结果 't'

'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

常见的使用方法

re.match 从头开始匹配 match(pattern, string, flags=0) pattern：正则模型 string ：要匹配的字符串 falgs ：匹配模式 re.I(re.IGNORECASE): 忽略大小写（括号内是完整写法，下同） M(MULTILINE): 多行模式，改变'^'和'$'的行为 * S(DOTALL): 点任意匹配模式，改变'.'的行为

# 无分组
r = re.match("hw+", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

# 为何要有分组？提取匹配成功的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）

r = re.match("h(w+).*(?P<name>d)$", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

re.search 浏览整个字符串去匹配第一个，未匹配成功返回None search(pattern, string, flags=0)

# 无分组
r = re.search("aw+", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组
r = re.search("a(w+).*(?P<name>d)$", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

re.findall 把所有匹配到的字符放到列表中，以列表中的元素返回 findall(pattern, string, flags=0)

# 无分组
r = re.findall("aw+",origin)
print(r)

# 有分组
origin = "hello alex bcd abcd lge acd 19"
r = re.findall("a((w*)c)(d)", origin)
print(r)

re.split 以匹配到的字符当做列表分隔符 split(pattern, string, maxsplit=0, flags=0) pattern：正则模型 string ：要匹配的字符串 maxsplit：指定分割个数 flags ：匹配模式

# 无分组
origin = "hello alex bcd alex lge alex acd 19"
r = re.split("alex", origin, 1)
print(r)

# 有分组

origin = "hello alex bcd alex lge alex acd 19"
r1 = re.split("(alex)", origin, 1)
print(r1)
r2 = re.split("(al(ex))", origin, 1)
print(r2)

re.sub 匹配字符并替换 sub(pattern, repl, string, count=0, flags=0) pattern：正则模型 repl ：要替换的字符串或可执行对象 string ：要匹配的字符串 count ：指定匹配个数 flags ：匹配模式

origin = "hello alex bcd alex lge alex acd 19"
r = re.sub("aw+", "999", origin, 2)
print(r)

常见的正则表达式

IP：
^(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}$
手机号：
^1[3|4|5|8][0-9]d{8}$
邮箱：
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+

mizore

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫