Python学习之旅 —— 基础篇(六)模块

本篇要点:
模块
 
一、模块
模块是实现某个功能的代码集合
函数式编程则完成一个功能,其他代码用来调用即可,提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来,可能需要多个函数才能完成(函数又可以在不同的.py文件中),n个 .py 文件组成的代码集合就称为模块。
 
模块的种类
模块分为三种:
    内置标准模块(标准库)
    第三方模块(开源模块)
    自定义模块
 
1、自定义模块 & 导入模块
使用模块前需要先将模块导入,模块导入的方式:
# 单模块,且在同一目录下,建议用 import
# 定义一个模块文件名字为common.py,在当前目录
def f2():
    print("F2")
 
#导入模块
import common
common.f2()
 
# 自定义模块在其他文件夹下,例如 lib/common2.py
# 方式一:
import lib.common2
# 方式二:推荐使用这种方式
from lib import common2 as lib_common2 # 从lib目录导入common2模块,命名为lib_common2
lib_common2.f2()
 
导入模块的依据的路径规则
>>> import sys
>>> print(sys.path)
['', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python35.zip', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages']
如果sys.path路径列表没有你想要的路径,可以通过 sys.path.append('路径') 添加。
 
2、第三方模块的安装
有两种方式,一种是通过yum/pip/atp-get等安装;另一种是通过源码编译安装。安装成功后,模块会自动安装到 sys.path 中的某个目录中。
 requests第三方模块
# 安装
$ pip3 install requests
    Collecting requests
      Downloading requests-2.10.0-py2.py3-none-any.whl (506kB)
        100% |████████████████████████████████| 512kB 229kB/s
    Installing collected packages: requests
    Successfully installed requests-2.10.0
 
# 导入,使用
import requests
ret = requests.get("http://www.baidu.com")
print(ret)
 
<Response [200]>
 
3、标准模块
不用安装,安装好python的环境后就可以使用,导入方式与之前相同。
 
a. json & pickle
这两个是用于序列化的两个模块
 
json与pickle的区别
  • json: 更适合跨语言的内容交互,使用的是字符串,只支持基本的python数据类型
  • pickle: 仅适用于python, 支持所有python类型的序列化
这两个模块都提供了四个功能:loads、load、dumps、dump
  • loads  在内存中,将字符串格式的变量转换为对应python数据类型
  • dumps  与loads相反,将python数据类型转换为字符串,赋值给变量
  • load  读取文件中内容,并转换为python的数据类型
  • dump  把内容写入到文件
# pikle
import pickle

li = [11,22,33, ]
# 内存中转换
s = pickle.dumps(li)
print(s, type(s))
 
ret = pickle.loads(s)
print(ret)
 
# 文件内读写
pickle.dump(li, open("db", "wb"))  # 只能是字节类型的读写
res = pickle.load(open("db", "rb"))
print(res, type(res))
 
import json
# 字典数据类型转换成字典样式的字符串
dic = {'k1': 'v1', 'k2': 'v2', }
res = json.dumps(dic)
print(res, type(res))
 
# 字典样式的字符串字典转换成字典数据类型
s1 = '{"k2": "v2", "k1": "v1"}'
d1 = json.loads(s1)
print(d1, type(d1))
 
requests与json小练习:
import requests
import json
 
r = requests.get("http://wthrcdn.etouch.cn/weather_mini?city=北京")
r.encoding = "utf-8"
s1 = r.text  # 读取网页访问结果的字符串到内存
res = json.loads(s1)  # 将字符串转换成python的字典类型
print(type(res), type(r))
 
结果:
<class 'dict'> <class 'requests.models.Response'>
 
json.dump(s1, open("msg.data", "w"))  # 将s1写入到文件
 
res = json.load(open("msg.data", "r"))  # 将文件内容字符串读取到内存,并转换成相应的python字典或列表等数据类型
print(res, type(res))
 
b.time & datetime
 
time 
时间相关的操作,时间有三种表示方式: 
时间戳            1970年1月1日之后的秒,即:time.time()
格式化的字符串        2014-11-11 11:11,    即:time.strftime('%Y-%m-%d')
结构化时间        元组包含了:年、日、星期等... time.struct_time    即:time.localtime()
 
time 模块中方法函数源码
Functions:
 
time() -- return current time in seconds since the Epoch as a float  # 返回当前时间的时间戳
clock() -- return CPU time since process start as a float  # 返回程序运行的CPU时间
sleep() -- delay for a number of seconds given as a float  
gmtime() -- convert seconds since Epoch to UTC tuple  # 以元组形式返回当前时间相关内容,可以传递时间戳
localtime() -- convert seconds since Epoch to local time tuple
asctime() -- convert time tuple to string
ctime() -- convert time in seconds to string
mktime() -- convert local time tuple to seconds since Epoch
strftime() -- convert time tuple to string according to format specification
strptime() -- parse string to time tuple according to format specification
tzset() -- change the local timezone
"""
# no imports
 
# Variables with simple values
 
altzone = -28800
 
daylight = 0
 
timezone = -28800
 
_STRUCT_TM_ITEMS = 11
 
# functions
 
def asctime(p_tuple=None): # real signature unknown; restored from __doc__
    """
    asctime([tuple]) -> string
 
    Convert a time tuple to a string, e.g. 'Sat Jun 06 16:26:11 1998'.
    When the time tuple is not present, current time as returned by localtime()
    is used.
    """
    return ""
 
def clock(): # real signature unknown; restored from __doc__
    """
    clock() -> floating point number
 
    Return the CPU time or real time since the start of the process or since
    the first call to clock().  This has as much precision as the system
    records.
    """
    return 0.0
 
def ctime(seconds=None): # known case of time.ctime
    """
    ctime(seconds) -> string
 
    Convert a time in seconds since the Epoch to a string in local time.
    This is equivalent to asctime(localtime(seconds)). When the time tuple is
    not present, current time as returned by localtime() is used.
    """
    return ""
 
def get_clock_info(name): # real signature unknown; restored from __doc__
    """
    get_clock_info(name: str) -> dict
 
    Get information of the specified clock.
    """
    return {}
 
def gmtime(seconds=None): # real signature unknown; restored from __doc__
    """
    gmtime([seconds]) -> (tm_year, tm_mon, tm_mday, tm_hour, tm_min,
                           tm_sec, tm_wday, tm_yday, tm_isdst)
 
    Convert seconds since the Epoch to a time tuple expressing UTC (a.k.a.
    GMT).  When 'seconds' is not passed in, convert the current time instead.
 
    If the platform supports the tm_gmtoff and tm_zone, they are available as
    attributes only.
    """
    pass
 
def localtime(seconds=None): # real signature unknown; restored from __doc__
    """
    localtime([seconds]) -> (tm_year,tm_mon,tm_mday,tm_hour,tm_min,
                              tm_sec,tm_wday,tm_yday,tm_isdst)
 
    Convert seconds since the Epoch to a time tuple expressing local time.
    When 'seconds' is not passed in, convert the current time instead.
    """
    pass
 
def mktime(p_tuple): # real signature unknown; restored from __doc__
    """
    mktime(tuple) -> floating point number
 
    Convert a time tuple in local time to seconds since the Epoch.
    Note that mktime(gmtime(0)) will not generally return zero for most
    time zones; instead the returned value will either be equal to that
    of the timezone or altzone attributes on the time module.
    """
    return 0.0
 
def monotonic(): # real signature unknown; restored from __doc__
    """
    monotonic() -> float
 
    Monotonic clock, cannot go backward.
    """
    return 0.0
 
def perf_counter(): # real signature unknown; restored from __doc__
    """
    perf_counter() -> float
 
    Performance counter for benchmarking.
    """
    return 0.0
 
def process_time(): # real signature unknown; restored from __doc__
    """
    process_time() -> float
 
    Process time for profiling: sum of the kernel and user-space CPU time.
    """
    return 0.0
 
def sleep(seconds): # real signature unknown; restored from __doc__
    """
    sleep(seconds)
 
    Delay execution for a given number of seconds.  The argument may be
    a floating point number for subsecond precision.
    """
    pass
 
def strftime(format, p_tuple=None): # real signature unknown; restored from __doc__
    """
    strftime(format[, tuple]) -> string
 
    Convert a time tuple to a string according to a format specification.
    See the library reference manual for formatting codes. When the time tuple
    is not present, current time as returned by localtime() is used.
 
    Commonly used format codes:
 
    %Y  Year with century as a decimal number.
    %m  Month as a decimal number [01,12].
    %d  Day of the month as a decimal number [01,31].
    %H  Hour (24-hour clock) as a decimal number [00,23].
    %M  Minute as a decimal number [00,59].
    %S  Second as a decimal number [00,61].
    %z  Time zone offset from UTC.
    %a  Locale's abbreviated weekday name.
    %A  Locale's full weekday name.
    %b  Locale's abbreviated month name.
    %B  Locale's full month name.
    %c  Locale's appropriate date and time representation.
    %I  Hour (12-hour clock) as a decimal number [01,12].
    %p  Locale's equivalent of either AM or PM.
 
    Other codes may be available on your platform.  See documentation for
    the C library strftime function.
    """
    return ""
 
def strptime(string, format): # real signature unknown; restored from __doc__
    """
    strptime(string, format) -> struct_time
 
    Parse a string to a time tuple according to a format specification.
    See the library reference manual for formatting codes (same as
    strftime()).
 
    Commonly used format codes:
 
    %Y  Year with century as a decimal number.
    %m  Month as a decimal number [01,12].
    %d  Day of the month as a decimal number [01,31].
    %H  Hour (24-hour clock) as a decimal number [00,23].
    %M  Minute as a decimal number [00,59].
    %S  Second as a decimal number [00,61].
    %z  Time zone offset from UTC.
    %a  Locale's abbreviated weekday name.
    %A  Locale's full weekday name.
    %b  Locale's abbreviated month name.
    %B  Locale's full month name.
    %c  Locale's appropriate date and time representation.
    %I  Hour (12-hour clock) as a decimal number [01,12].
    %p  Locale's equivalent of either AM or PM.
 
    Other codes may be available on your platform.  See documentation for
    the C library strftime function.
    """
    return struct_time
 
def time(): # real signature unknown; restored from __doc__
    """
    time() -> floating point number
 
    Return the current time in seconds since the Epoch.
    Fractions of a second may be present if the system clock provides them.
    """
    return 0.0
 
def tzset(): # real signature unknown; restored from __doc__
    """
    tzset()
 
    Initialize, or reinitialize, the local timezone to the value stored in
    os.environ['TZ']. The TZ environment variable should be specified in
    standard Unix timezone format as documented in the tzset man page
    (eg. 'US/Eastern', 'Europe/Amsterdam'). Unknown timezones will silently
    fall back to UTC. If the TZ environment variable is not set, the local
    timezone is set to the systems best guess of wallclock time.
    Changing the TZ environment variable without calling tzset *may* change
    the local timezone used by methods such as localtime, but this behaviour
    should not be relied on.
    """
    pass
 
# classes
 
class struct_time(tuple):
    """
    The time value as returned by gmtime(), localtime(), and strptime(), and
     accepted by asctime(), mktime() and strftime().  May be considered as a
     sequence of 9 integers.
 
     Note that several fields' values are not the same as those defined by
     the C language standard for struct tm.  For example, the value of the
     field tm_year is the actual year, not year - 1900.  See individual
     fields' descriptions for details.
    """
    def __init__(self, *args, **kwargs): # real signature unknown
        pass
 
    @staticmethod # known case of __new__
    def __new__(*args, **kwargs): # real signature unknown
        """ Create and return a new object.  See help(type) for accurate signature. """
        pass
 
    def __reduce__(self, *args, **kwargs): # real signature unknown
        pass
 
    def __repr__(self, *args, **kwargs): # real signature unknown
        """ Return repr(self). """
        pass
 
    tm_gmtoff = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """offset from UTC in seconds"""
 
    tm_hour = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """hours, range [0, 23]"""
 
    tm_isdst = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """1 if summer time is in effect, 0 if not, and -1 if unknown"""
 
    tm_mday = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """day of month, range [1, 31]"""
 
    tm_min = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """minutes, range [0, 59]"""
 
    tm_mon = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """month of year, range [1, 12]"""
 
    tm_sec = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """seconds, range [0, 61])"""
 
    tm_wday = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """day of week, range [0, 6], Monday is 0"""
 
    tm_yday = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """day of year, range [1, 366]"""
 
    tm_year = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """year, for example, 1993"""
 
    tm_zone = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default
    """abbreviation of timezone name"""
 
 
    n_fields = 11
    n_sequence_fields = 9
    n_unnamed_fields = 0

time模块源码
time模块源码
 
# time()
print(time.time())
结果:
1465536842.780465
 
# gmtime()  # 返回结构化方式的时间,元组形式,格林尼治时间,可传递时间戳参数
print(time.gmtime(time.time()))
结果:
time.struct_time(tm_year=2016, tm_mon=6, tm_mday=10, tm_hour=5, tm_min=44, tm_sec=34, tm_wday=4, tm_yday=162, tm_isdst=0)
 
# localtime()  # 返回本地时间,结构化的方式,可传递时间戳参数
print(time.localtime())
结果:
time.struct_time(tm_year=2016, tm_mon=6, tm_mday=10, tm_hour=13, tm_min=46, tm_sec=28, tm_wday=4, tm_yday=162, tm_isdst=0)
 
# strptime()  # 将字符串转化为日期格式
print(time.strptime('1986-10-31', '%Y-%m-%d'))
结果:
time.struct_time(tm_year=1986, tm_mon=10, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=304, tm_isdst=-1)
 
# strftime()  # 以指定方式输出当前时间,字符串形式
print(time.strftime('%Y-%m-%d %H:%M:%S'))
结果:
2016-06-10 13:56:08
 
# asctime() # 结构化格式转换为可读性好的格式
print(time.asctime())
print(time.asctime(time.localtime()))  # 可以传递p_tuple参数
 
# ctime() 默认当前时间,可以传递时间戳参数
print(time.ctime(time.time()))
 
datetime 模块,通过time模块中的方法扩展,使用起来更方便
 
import datetime
 
print(datetime.date.today())  # 输出格式2016-06-10
# 2016-06-10 将时间戳转成日期格式,可以通过加减秒数来改变输出的书时间
print(datetime.date.fromtimestamp(time.time() - 864400))
current_time = datetime.datetime.now()  # 当前时间
print(current_time)  # 输出2016-06-10 16:11:56.832662
print(current_time.timetuple())  # 返回struct_time格式
 
# datetime.replace([year[, month[, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]]]]])
print(current_time.replace(1986, 10, 31))  # 输出1986-10-31,时间使用当前时间,但指定的值(年、月、日)将被替换
 
str_to_date = datetime.datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M")  # 将字符串转换成日期格式
new_date = datetime.datetime.now() + datetime.timedelta(days=10)  # 比现在加10天
new_date = datetime.datetime.now() + datetime.timedelta(days=-10)  # 比现在减10天
new_date = datetime.datetime.now() + datetime.timedelta(hours=-10)  # 比现在减10小时
new_date = datetime.datetime.now() + datetime.timedelta(seconds=120)  # 比现在+120s
print(new_date)
 
c.logging
用于便捷记录日志且线程安全的模块
很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别。
 
示例:
import logging
logging.warning("user [Pesen] attempted wrong password more than 3 times")
logging.critical("host is down")
结果:
WARNING:root:user [Pesen] attempted wrong password more than 3 times
CRITICAL:root:host is down

 

 
日志级别:
LevelWhen it’s used
DEBUGDetailed information, typically of interest only when diagnosing problems.
INFOConfirmation that things are working as expected.
WARNINGAn indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
ERRORDue to a more serious problem, the software has not been able to perform some function.
CRITICALA serious error, indicating that the program itself may be unable to continue running.
 
权重:(只有大于当前日志等级的操作才会被记录)
CRITICAL  =  50
FATAL  =  CRITICAL
ERROR  =  40
WARNING  =  30
WARN  =  WARNING
INFO  =  20
DEBUG  =  10
NOTSET  =  0
 
# 把日志写到文件
logging.basicConfig(filename='./example.log', level=logging.INFO)
logging.debug('DEBUG’)  # 配置文件中设定的日志级别是INFO,比这个级别低的debug不会输出。
logging.info('Info')
logging.warning('Warning')

结果:
$ cat example.log
    INFO:root:Info
    WARNING:root:Warning
 
# 日志格式中加上时间
>>> import logging
>>> logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
>>> logging.warning('is when this event was logged.')
06/10/2016 04:42:21 PM is when this event was logged.
 

 

如果想要将日志在屏幕和文件都打印(或者在屏幕、多个文件中打印),可是用logging模块的几个组件来实现,介绍如下:
The logging library takes a modular approach and offers several categories of components: loggers, handlers, filters, and formatters.
    • Loggers expose the interface that application code directly uses.
    • Handlers send the log records (created by loggers) to the appropriate destination.
    • Filters provide a finer grained facility for determining which log records to output.
    • Formatters specify the layout of log records in the final output.
 
import logging

#创建logger
logger = logging.getLogger('TEST-LOG')  # logger的名称
logger.setLevel(logging.DEBUG)  # 日志级别,大于等于这个级别的才能输出到指定位置

# 创建屏幕终端输出的handler,并设定日志级别为DEBUG
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)

# 创建输出到文件的handler,并设定日志界别为WARNING
fh = logging.FileHandler("access.log")
fh.setLevel(logging.WARNING)

# 创建 formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# 把formatter的设置指定给handler
ch.setFormatter(formatter)
fh.setFormatter(formatter)

# 把handler传递给logger
logger.addHandler(ch)
logger.addHandler(fh)

# 将上面的配置应用到具体的日志输出
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')
 
结果:
屏幕输出:
2016-06-10 16:55:47,628 - TEST-LOG - DEBUG - debug message
2016-06-10 16:55:47,629 - TEST-LOG - INFO - info message
2016-06-10 16:55:47,629 - TEST-LOG - WARNING - warn message
2016-06-10 16:55:47,629 - TEST-LOG - ERROR - error message
2016-06-10 16:55:47,629 - TEST-LOG - CRITICAL - critical message
 
文件输出:
2016-06-10 16:55:47,629 - TEST-LOG - WARNING - warn message
2016-06-10 16:55:47,629 - TEST-LOG - ERROR - error message
2016-06-10 16:55:47,629 - TEST-LOG - CRITICAL - critical message

 

 
formatter可选的配置选项:
 
d. sys
用于提供对Python解释器相关的操作:
sys.argv           命令行参数List,第一个元素是程序本身路径
sys.exit(n)        退出程序,正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值  # 重点掌握
sys.platform       返回操作系统平台名称
sys.stdin          输入相关
sys.stdout         输出相关
sys.stderror       错误相关
 
比较有用的是通过sys.stdout实现进度条相关程序:
import sys
import time


def view_bar(num, total):
    rate = num / total

    r1 = "\r{}>{:.0%}".format("="*num, rate)
    sys.stdout.write(r1)
    sys.stdout.flush()


if __name__ == '__main__':
    for i in range(0, 101):
        time.sleep(0.1)
        view_bar(i, 100)
 
e. os
用于提供系统级别的操作:
os.getcwd()                 获取当前工作目录,即当前python脚本工作的目录路径
os.chdir("dirname")         改变当前脚本工作目录;相当于shell下cd
os.curdir                   返回当前目录: ('.')
os.pardir                   获取当前目录的父目录字符串名:('..')
os.makedirs('dir1/dir2')    可生成多层递归目录
os.removedirs('dirname1')   若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
os.mkdir('dirname')         生成单级目录;相当于shell中mkdir dirname
os.rmdir('dirname')         删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
os.listdir('dirname')       列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
os.remove()                 删除一个文件
os.rename("oldname","new")  重命名文件/目录
os.stat('path/filename')    获取文件/目录信息
os.sep                      操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
os.linesep                  当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
os.pathsep                  用于分割文件路径的字符串
os.name                     字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")   运行shell命令,直接显示
os.environ                  获取系统环境变量
os.path.abspath(path)       返回path规范化的绝对路径  # 重点掌握
os.path.split(path)         将path分割成目录和文件名二元组返回
os.path.dirname(path)       返回path的目录。其实就是os.path.split(path)的第一个元素  # 重点掌握
os.path.basename(path)      返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)        如果path存在,返回True;如果path不存在,返回False
os.path.isabs(path)         如果path是绝对路径,返回True
os.path.isfile(path)        如果path是一个存在的文件,返回True。否则返回False
os.path.isdir(path)         如果path是一个存在的目录,则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回,第一个绝对路径之前的参数将被忽略  # 重点掌握
os.path.getatime(path)      返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)      返回path所指向的文件或者目录的最后修改时间
 
f. hashlib
用于加密相关的操作,代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法。
import hashlib

# md5
hash = hashlib.md5()
hash.update(bytes('test', encoding='utf-8'))
print(hash.hexdigest())
 
结果:
098f6bcd4621d373cade4e832627b4f6

# md5 自定义关键字防止撞库
obj = hashlib.md5(bytes("sadfasfa", encoding="utf-8"))  #
obj.update(bytes("test", encoding="utf-8"))
result = obj.hexdigest()
print(result)
 
结果:
5d614e1a6f4d669f5e1cb1c2f1c38b09
 
# sha1
hash = hashlib.sha1()
hash.update(bytes('test', encoding='utf-8'))
print(hash.hexdigest())
 
结果:
a94a8fe5ccb19ba61c4c0873d391e987982fbbd3
 
# sha256
hash = hashlib.sha256()
hash.update(bytes('test', encoding='utf-8'))
print(hash.hexdigest())
 
结果:
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
 
# sha384
hash = hashlib.sha384()
hash.update(bytes('test', encoding='utf-8'))
print(hash.hexdigest())
 
结果:
768412320f7b0aa5812fce428dc4706b3cae50e02a64caa16a782249bfe8efc4b7ef1ccb126255d196047dfedf17a0a9
 
# sha512
hash = hashlib.sha512()
hash.update(bytes('test', encoding='utf-8'))
print(hash.hexdigest())
 
结果:
ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff

 

 
g. 正则表达式 re
  • 简介:
就其本质而言,正则表达式(或 RE)是一种小型的、高度专业化的编程语言,
(在Python中)它内嵌在Python中,并通过 re 模块实现。正则表达式模式被
编译成一系列的字节码,然后由用 C 编写的匹配引擎执行。
 
  • 字符匹配(普通字符,元字符)
普通字符:大多数字符和字母都会和自身匹配
        >>> re.findall('alex','yuanaleSxalexwupeiqi')
           ['alex']
 
元字符:  .   ^   $   *   +   ?   { }   [ ]   |   ( )  \  
 

字符:

  . 匹配除换行符以外的任意字符
  \w 匹配字母或数字或下划线或汉字
  \s 匹配任意的空白符
  \d 匹配数字
  \b 匹配单词的开始或结束
  ^ 匹配字符串的开始
  $ 匹配字符串的结束

 

次数:

  * 重复零次或更多次
  + 重复一次或更多次
  ? 重复零次或一次
  {n} 重复n次
  {n,} 重复n次或更多次
  {n,m} 重复n到m次

 
     我们首先考察的元字符是"[" 和 "]"。它们常用来指定一个字符类别,所谓字符类 别就是你想匹配的一个字符集。字符可以单个列出,也可以用“-”号分隔的两个给定 字符来表示一个字符区间。例如,[abc] 将匹配"a", "b", 或 "c"中的任意一个字符;也可以用区间[a-c]来表示同一字符集,和前者效果一致。如果你只想匹配小写 字母,那么 RE 应写成 [a-z]。
     元字符在类别里并不起作用。例如,[akm$]将匹配字符"a", "k", "m", 或 "$" 中的任意一个;"$"通常用作元字符,但在字符类别里,其特性被除去,恢复成普通字符。
     在一个字符类中,只有字符^、-、]和\有特殊含义。
     字符\仍然表示转义,字符-可以定义字符范围,字符^放在前面,表示非。
 
 补充:*?,+?,??,{m,n}?    前面的*,+,?等都是贪婪匹配,也就是尽可能匹配,后面加?号使其变成惰性匹配
从前面的描述可以看到'*','+'和'*'都是贪婪的,但这也许并不是我们说要的,
所以,可以在后面加个问号,将策略改为非贪婪,只匹配尽量少的RE。示例,
体会两者的区别: 
>>> re.findall(r"a(\d+?)","a23b") # 非贪婪模式
        ['2']
>>> re.findall(r"a(\d+)","a23b")
        ['23']
>>> re.search('<(.*)>', '<H1>title</H1>').group()
'<H1>title</H1>'
re.search('<(.*?)>', '<H1>title</H1>').group()
'<H1>'
 
注意比较这种情况:
>>> re.findall(r"a(\d+)b","a23b")
        ['23']
>>> re.findall(r"a(\d+?)b","a23b") #如果前后均有限定条件,则非匹配模式失效
        ['23']
 
\:
反斜杠后边跟元字符去除特殊功能,
反斜杠后边跟普通字符实现特殊功能。
引用序号对应的字组所匹配的字符串
re.search(r"(alex)(eric)com\2","alexericcomeric")
 
\d  匹配任何十进制数;它相当于类 [0-9]。
\D  匹配任何非数字字符;它相当于类 [^0-9]。
\s  匹配任何空白字符;它相当于类  [ \t\n\r\f\v]。
\S  匹配任何非空白字符;它相当于类 [^ \t\n\r\f\v]。
\w  匹配任何字母数字字符;它相当于类 [a-zA-Z0-9_]。
\W  匹配任何非字母数字字符;它相当于类 [^a-zA-Z0-9_]
\b: 匹配一个单词边界,也就是指单词和空格间的位置。
    匹配单词边界(包括开始和结束),这里的“单词”,是指连续的字母、数字和
    下划线组成的字符串。注意,\b的定义是\w和\W的交界,
    这是个零宽界定符(zero-width assertions)只用以匹配单词的词首和词尾。
    单词被定义为一个字母数字序列,因此词尾就是用空白符或非字母数字符来标
    示的。
>>> re.findall(r"abc\b","dzx &abc sdsadasabcasdsadasdabcasdsa")
['abc']
>>> re.findall(r"\babc\b","dzx &abc sdsadasabcasdsadasdabcasdsa")
['abc']
>>> re.findall(r"\babc\b","dzx sabc sdsadasabcasdsadasdabcasdsa")
[]
 
例如, 'er/b' 可以匹配"never" 中的 'er',但不能匹配 "verb" 中的 'er'。
\b只是匹配字符串开头结尾及空格回车等的位置, 不会匹配空格符本身
例如"abc sdsadasabcasdsadasdabcasdsa",
\sabc\s不能匹配,\babc\b可以匹配到"abc"
>>> re.findall(“\sabc\s","abc sdsadasabcasdsadasdabcasdsa")
[]
>>> re.findall(r"\babc\b","abc sdsadasabcasdsadasdabcasdsa")
['abc']
 
\b 就是用在你匹配整个单词的时候。 如果不是整个单词就不匹配。 你想匹
配 I 的话,你知道,很多单词里都有I的,但我只想匹配I,就是“我”,这个时
候用 \bI\b
 
 
  • 函数:
match:re.match(pattern, string, flags=0)
# match,从起始位置开始匹配,匹配成功返回一个对象,未匹配成功返回None
 
 match(pattern, string, flags=0)
 # pattern: 正则模型
 # string : 要匹配的字符串
 # falgs  : 匹配模式
     X  VERBOSE     Ignore whitespace and comments for nicer looking RE's.
     I  IGNORECASE  Perform case-insensitive matching.
     M  MULTILINE   "^" matches the beginning of lines (after a newline)
                    as well as the string.
                    "$" matches the end of lines (before a newline) as well
                    as the end of the string.
     S  DOTALL      "." matches any character at all, including the newline.
 
     A  ASCII       For string patterns, make \w, \W, \b, \B, \d, \D
                    match the corresponding ASCII character categories
                    (rather than the whole Unicode categories, which is the
                    default).
                    For bytes patterns, this flag is the only available
                    behaviour and needn't be specified.
      
     L  LOCALE      Make \w, \W, \b, \B, dependent on the current locale.
     U  UNICODE     For compatibility only. Ignored for string patterns (it
                    is the default), and forbidden for bytes patterns.
# 无分组
        r = re.match("h\w+", origin)
        print(r.group())     # 获取匹配到的所有结果
        print(r.groups())    # 获取模型中匹配到的分组结果
        print(r.groupdict()) # 获取模型中匹配到的分组结果

        # 有分组

        # 为何要有分组?提取匹配成功的指定内容(先匹配成功全部正则,再匹配成功的局部内容提取出来)

        r = re.match("h(\w+).*(?P<name>\d)$", origin)
        print(r.group())     # 获取匹配到的所有结果
        print(r.groups())    # 获取模型中匹配到的分组结果
        print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组

Demo
View Code

 

flags    编译标志位,用于修改正则表达式的匹配方式,如:是否区分大小写,
多行匹配等等。
re.match('com', 'comwww.runcomoob').group()
re.match('com', 'Comwww.runComoob',re.I).group()
 
search:re.search(pattern, string, flags=0)
re.search('\dcom', 'www.4comrunoob.5com').group()
 
 
注意:
re.match('com', 'comwww.runcomoob')
re.search('\dcom', 'www.4comrunoob.5com')
一旦匹配成功,就是一个match object 对象,而match object 对象拥有以下方法:
group()    返回被 RE 匹配的字符串
start()    返回匹配开始的位置
end()    返回匹配结束的位置
span()    返回一个元组包含匹配 (开始,结束) 的位置
group() 返回re整体匹配的字符串,可以一次输入多个组号,对应组号匹配的字符串。
1. group()返回re整体匹配的字符串,
2. group (n,m) 返回组号为n,m所匹配的字符串,如果组号不存在,则返回indexError异常
3.groups()groups() 方法返回一个包含正则表达式中所有小组字符串的元组,从 1 到所含的小组号,通常groups()不需要参数,返回一个元组,元组中的元就是正则表达式中定义的组。

 

import re
a = "123abc456"
 re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(0)   #123abc456,返回整体
 re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(1)   #123
 re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(2)   #abc
 re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(3)   #456
 
 group(1) 列出第一个括号匹配部分,group(2) 列出第二个括号匹配部分,group(3)列出第三个括号匹配部分。
 
search
# search,浏览整个字符串去匹配第一个,未匹配成功返回None
# search(pattern, string, flags=0)
# 无分组

r = re.search("a\w+", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

r = re.search("a(\w+).*(?P<name>\d)$", origin)
print(r.group())     # 获取匹配到的所有结果
print(r.groups())    # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组
View Code

 

findall:
# findall,获取非重复的匹配列表;如果有一个组则以列表形式返回,且每一个匹配均是字符串;如果模型中有多个组,则以列表形式返回,且每一个匹配均是元祖;
# 空的匹配也会包含在结果中
#findall(pattern, string, flags=0)
re.findall  以列表形式返回所有匹配的字符串
  re.findall可以获取字符串中所有匹配的字符串。如:
p = re.compile(r'\d+')
print p.findall('one1two2three3four4')
 
    
# 无分组
r = re.findall("a\w+",origin)
print(r)

# 有分组
origin = "hello alex bcd abcd lge acd 19"
r = re.findall("a((\w*)c)(d)", origin)
print(r)
 
re.findall(r'\w*oo\w*', text);获取字符串中,包含'oo'的所有单词。
 
import re
text = "JGood is a handsome boy,he is handsome and cool,clever,and so on ...."
print(re.findall(r'\w*oo\w*',text))  #结果:['JGood', 'cool']
#print(re.findall(r'(\w)*oo(\w)*',text))  # ()表示子表达式 结果:[('G', 'd'), ('c', 'l')]
 
finditer():迭代器,多个字符串匹配同一个规则时,先编译规则,再通过迭代器批量操作。
>>> p = re.compile(r'\d+')
>>> iterator = p.finditer('12 drumm44ers drumming, 11 ... 10 ...')
>>> for match in iterator:
     match.group() , match.span()

 

sub subn:替换
re.sub(pattern, repl, string, max=0)
re.sub("g.t","have",'I get A,  I got B ,I gut C')

结果:
I have A,  I have B ,I have C
 
split:字符串切割
p = re.compile(r'\d+')
p.split('one1two2three3four4')
 
re.split('\d+','one1two2three3four4')

结果:
['one', 'two', 'three', 'four', '']
['one', 'two', 'three', 'four', '']

 

re.compile(strPattern[, flag]):
    这个方法是Pattern类的工厂方法,用于将字符串形式的正则表达式编译为
    Pattern对象。 第二个参数flag是匹配模式,取值可以使用按位或运算符'|'
    表示同时生效,比如re.I | re.M
  可以把正则表达式编译成一个正则表达式对象。可以把那些经常使用的正则
表达式编译成正则表达式对象,这样可以提高一定的效率。下面是一个正则表达式
对象的一个例子:
import re
text = "JGood is a handsome boy, he is cool, clever, and so on..."
regex = re.compile(r'\w*oo\w*')
print(regex.findall(text))  #查找所有包含'oo'的单词

结果:
['JGood', 'cool']
 
常用正则表达式:
IP:
^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手机号:
^1[3|4|5|8][0-9]\d{8}$
邮箱:
[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+

 

 h.configparser
 
configparser用于处理特定格式的文件,其本质上是利用open来操作文件。
1 # 注释1
2 ;  注释2
3  
4 [section1] # 节点
5 k1 = v1    #
6 k2:v2       #
7  
8 [section2] # 节点
9 k1 = v1    #
特定文件格式

 

[section1]
k2 = 200
k10 = 123

[section2]
k1 = 200
文件
import configparser

# 获取所有节点
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')
ret = config.sections()
print(ret)

# 获取所有键值对
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')
ret = config.items('section1')
print(ret)

# 获取键值对所有键
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')
ret = config.options('section1')
print(ret)

# 获取键值对指定key的值
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')
v = config.get('section1', 'k1')
# v = config.getint('section1', 'k1')  # 转成整数
# v = config.getfloat('section1', 'k1')  # 转成浮点数
# v = config.getboolean('section1', 'k1')  # 转成布尔值
print(v)

# 检查、删除、添加节点
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')

# 检查
has_sec = config.has_section('section2')
print(has_sec)

# 添加节点
config.add_section("section3")
config.write(open('section', 'w'))

# 删除节点
config.remove_section("section3")
config.write(open('section', 'w'))


# 检查、删除、设置指定组内的键值对
config = configparser.ConfigParser()
config.read('section', encoding='utf-8')

# 检查
has_opt = config.has_option('section1', 'k1')
print(has_opt)

# 删除
config.remove_option('section1', 'k1')
config.write(open('section', 'w'))

# 设置
config.set('section1', 'k10', "123")
config.write(open('section', 'w'))

 

I.XML

XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下: 
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
xml示例文件
 1 # 方法一:利用ElementTree.XML将字符串解析成xml对象
 2 from xml.etree import ElementTree as ET
 3 
 4 
 5 # 打开文件,读取XML内容
 6 str_xml = open('xo.xml', 'r').read()
 7 
 8 # 将字符串解析成xml特殊对象,root代指xml文件的根节点
 9 root = ET.XML(str_xml)
10 
11 
12 # 方法二:利用ElementTree.parse将文件直接解析成xml对象
13 from xml.etree import ElementTree as ET
14 
15 # 直接解析xml文件
16 tree = ET.parse("xo.xml")
17 
18 # 获取xml文件的根节点
19 root = tree.getroot()
解析文件
# XML格式类型是节点嵌套节点,对于每一个节点均有以下功能,以便对当前节点进行操作:


class Element:
    """An XML element.

    This class is the reference implementation of the Element interface.

    An element's length is its number of subelements.  That means if you
    want to check if an element is truly empty, you should check BOTH
    its length AND its text attribute.

    The element tag, attribute names, and attribute values can be either
    bytes or strings.

    *tag* is the element name.  *attrib* is an optional dictionary containing
    element attributes. *extra* are additional element attributes given as
    keyword arguments.

    Example form:
        <tag attrib>text<child/>...</tag>tail

    """

    当前节点的标签名
    tag = None
    """The element's name."""

    当前节点的属性

    attrib = None
    """Dictionary of the element's attributes."""

    当前节点的内容
    text = None
    """
    Text before first subelement. This is either a string or the value None.
    Note that if there is no text, this attribute may be either
    None or the empty string, depending on the parser.

    """

    tail = None
    """
    Text after this element's end tag, but before the next sibling element's
    start tag.  This is either a string or the value None.  Note that if there
    was no text, this attribute may be either None or an empty string,
    depending on the parser.

    """

    def __init__(self, tag, attrib={}, **extra):
        if not isinstance(attrib, dict):
            raise TypeError("attrib must be dict, not %s" % (
                attrib.__class__.__name__,))
        attrib = attrib.copy()
        attrib.update(extra)
        self.tag = tag
        self.attrib = attrib
        self._children = []

    def __repr__(self):
        return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))

    def makeelement(self, tag, attrib):
        创建一个新节点
        """Create a new element with the same type.

        *tag* is a string containing the element name.
        *attrib* is a dictionary containing the element attributes.

        Do not call this method, use the SubElement factory function instead.

        """
        return self.__class__(tag, attrib)

    def copy(self):
        """Return copy of current element.

        This creates a shallow copy. Subelements will be shared with the
        original tree.

        """
        elem = self.makeelement(self.tag, self.attrib)
        elem.text = self.text
        elem.tail = self.tail
        elem[:] = self
        return elem

    def __len__(self):
        return len(self._children)

    def __bool__(self):
        warnings.warn(
            "The behavior of this method will change in future versions.  "
            "Use specific 'len(elem)' or 'elem is not None' test instead.",
            FutureWarning, stacklevel=2
            )
        return len(self._children) != 0 # emulate old behaviour, for now

    def __getitem__(self, index):
        return self._children[index]

    def __setitem__(self, index, element):
        # if isinstance(index, slice):
        #     for elt in element:
        #         assert iselement(elt)
        # else:
        #     assert iselement(element)
        self._children[index] = element

    def __delitem__(self, index):
        del self._children[index]

    def append(self, subelement):
        为当前节点追加一个子节点
        """Add *subelement* to the end of this element.

        The new element will appear in document order after the last existing
        subelement (or directly after the text, if it's the first subelement),
        but before the end tag for this element.

        """
        self._assert_is_element(subelement)
        self._children.append(subelement)

    def extend(self, elements):
        为当前节点扩展 n 个子节点
        """Append subelements from a sequence.

        *elements* is a sequence with zero or more elements.

        """
        for element in elements:
            self._assert_is_element(element)
        self._children.extend(elements)

    def insert(self, index, subelement):
        在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置
        """Insert *subelement* at position *index*."""
        self._assert_is_element(subelement)
        self._children.insert(index, subelement)

    def _assert_is_element(self, e):
        # Need to refer to the actual Python implementation, not the
        # shadowing C implementation.
        if not isinstance(e, _Element_Py):
            raise TypeError('expected an Element, not %s' % type(e).__name__)

    def remove(self, subelement):
        在当前节点在子节点中删除某个节点
        """Remove matching subelement.

        Unlike the find methods, this method compares elements based on
        identity, NOT ON tag value or contents.  To remove subelements by
        other means, the easiest way is to use a list comprehension to
        select what elements to keep, and then use slice assignment to update
        the parent element.

        ValueError is raised if a matching element could not be found.

        """
        # assert iselement(element)
        self._children.remove(subelement)

    def getchildren(self):
        获取所有的子节点(废弃)
        """(Deprecated) Return all subelements.

        Elements are returned in document order.

        """
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'list(elem)' or iteration over elem instead.",
            DeprecationWarning, stacklevel=2
            )
        return self._children

    def find(self, path, namespaces=None):
        获取第一个寻找到的子节点
        """Find first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        return ElementPath.find(self, path, namespaces)

    def findtext(self, path, default=None, namespaces=None):
        获取第一个寻找到的子节点的内容
        """Find text for first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *default* is the value to return if the element was not found,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return text content of first matching element, or default value if
        none was found.  Note that if an element is found having no text
        content, the empty string is returned.

        """
        return ElementPath.findtext(self, path, default, namespaces)

    def findall(self, path, namespaces=None):
        获取所有的子节点
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Returns list containing all matching elements in document order.

        """
        return ElementPath.findall(self, path, namespaces)

    def iterfind(self, path, namespaces=None):
        获取所有指定的节点,并创建一个迭代器(可以被for循环)
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return an iterable yielding all matching elements in document order.

        """
        return ElementPath.iterfind(self, path, namespaces)

    def clear(self):
        清空节点
        """Reset element.

        This function removes all subelements, clears all attributes, and sets
        the text and tail attributes to None.

        """
        self.attrib.clear()
        self._children = []
        self.text = self.tail = None

    def get(self, key, default=None):
        获取当前节点的属性值
        """Get element attribute.

        Equivalent to attrib.get, but some implementations may handle this a
        bit more efficiently.  *key* is what attribute to look for, and
        *default* is what to return if the attribute was not found.

        Returns a string containing the attribute value, or the default if
        attribute was not found.

        """
        return self.attrib.get(key, default)

    def set(self, key, value):
        为当前节点设置属性值
        """Set element attribute.

        Equivalent to attrib[key] = value, but some implementations may handle
        this a bit more efficiently.  *key* is what attribute to set, and
        *value* is the attribute value to set it to.

        """
        self.attrib[key] = value

    def keys(self):
        获取当前节点的所有属性的 key

        """Get list of attribute names.

        Names are returned in an arbitrary order, just like an ordinary
        Python dict.  Equivalent to attrib.keys()

        """
        return self.attrib.keys()

    def items(self):
        获取当前节点的所有属性值,每个属性都是一个键值对
        """Get element attributes as a sequence.

        The attributes are returned in arbitrary order.  Equivalent to
        attrib.items().

        Return a list of (name, value) tuples.

        """
        return self.attrib.items()

    def iter(self, tag=None):
        在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。
        """Create tree iterator.

        The iterator loops over the element and all subelements in document
        order, returning all elements with a matching tag.

        If the tree structure is modified during iteration, new or removed
        elements may or may not be included.  To get a stable set, use the
        list() function on the iterator, and loop over the resulting list.

        *tag* is what tags to look for (default is to return all elements)

        Return an iterator containing all the matching elements.

        """
        if tag == "*":
            tag = None
        if tag is None or self.tag == tag:
            yield self
        for e in self._children:
            yield from e.iter(tag)

    # compatibility
    def getiterator(self, tag=None):
        # Change for a DeprecationWarning in 1.4
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'elem.iter()' or 'list(elem.iter())' instead.",
            PendingDeprecationWarning, stacklevel=2
        )
        return list(self.iter(tag))

    def itertext(self):
        在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。
        """Create text iterator.

        The iterator loops over the element and all subelements in document
        order, returning all inner text.

        """
        tag = self.tag
        if not isinstance(tag, str) and tag is not None:
            return
        if self.text:
            yield self.text
        for e in self:
            yield from e.itertext()
            if e.tail:
                yield e.tail
操作文件-节点功能源码

由于 每个节点 都具有以上的方法,并且在上一步骤中解析时均得到了root(xml文件的根节点),所以可以利用以上方法进行操作xml文件。

from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()


### 操作

# 顶层标签
print(root.tag)


# 遍历XML文档的第二层
for child in root:
    # 第二层节点的标签名称和标签属性
    print(child.tag, child.attrib)
    # 遍历XML文档的第三层
    for i in child:
        # 第二层节点的标签名称和内容
        print(i.tag,i.text)
遍历XML文档的所有内容
from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()


### 操作

# 顶层标签
print(root.tag)


# 遍历XML中所有的year节点
for node in root.iter('year'):
    # 节点的标签名称和内容
    print(node.tag, node.text)
遍历XML中指定的节点
from xml.etree import ElementTree as ET

############ 解析方式一 ############

# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']


############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')
修改节点内容--操作字符串
from xml.etree import ElementTree as ET

############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']


############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')
修改节点内容--操作文件
from xml.etree import ElementTree as ET

############ 解析字符串方式打开 ############

# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)

############ 操作 ############

# 顶层标签
print(root.tag)

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')
删除节点--操作字符串
from xml.etree import ElementTree as ET

############ 解析文件方式 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()

############ 操作 ############

# 顶层标签
print(root.tag)

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')
删除节点--操作文件
from xml.etree import ElementTree as ET

# 创建根节点
root = ET.Element("family")

# 创建大儿子节点
son1 = ET.Element("son", {"name": "大儿子"})
son2 = ET.Element("son", {"name": "二儿子"})

# 创建孙子节点
grandson1 = ET.Element("grandson", {"name": "大儿子的儿子"})
grandson2 = ET.Element("grandson", {"name": "二儿子的儿子"})

# 把孙子添加到对应的儿子节点中
son1.append(grandson1)
son2.append(grandson2)

# 把儿子添加到根节点中
root.append(son1)
root.append(son2)

tree = ET.ElementTree(root)
tree.write("family.xml", encoding="utf-8")
创建XML文档
#  待缩进的方法

from xml.etree import ElementTree as ET
from xml.dom import minidom


def prettify(elem):
    """
    将节点转换成字符串,并添加缩进。
    """
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

# 创建根节点
root = ET.Element("family")

# 创建大儿子节点
son1 = ET.Element("son", {"name": "大儿子"})
son2 = ET.Element("son", {"name": "二儿子"})

# 创建孙子节点
grandson1 = ET.Element("grandson", {"name": "大儿子的儿子"})
grandson2 = ET.Element("grandson", {"name": "二儿子的儿子"})

# 把孙子添加到对应的儿子节点中
son1.append(grandson1)
son2.append(grandson2)

# 把儿子添加到根节点中
root.append(son1)
root.append(son2)

raw_str = prettify(root)
with open("family_new.xml", "w", encoding="utf-8") as f:
    f.write(raw_str)
创建XML文档--待缩进方式

 

G.subprocess

可以执行shell命令的相关模块和函数有:

  • os.system
  • os.spawn*
  • os.popen*          --废弃
  • popen2.*           --废弃
  • commands.*      --废弃,3.x中被移除

以上执行shell命令的相关的模块和函数的功能均在 subprocess 模块中实现,并提供了更丰富的功能。

import subprocess

# call 执行命令,状态码
ret = subprocess.call(["ls", "-l"], shell=False)
print(ret)
ret = subprocess.call("ls -l", shell=True)
print(ret)

# 执行命令,如果执行状态码是 0 ,则返回0,否则抛异常
subprocess.check_call("ls -l", shell=True)


# check_output 执行命令,如果状态码是 0 ,则返回执行结果,否则抛异常
ret = subprocess.check_output("ls -l", shell=True)
print(str(ret, encoding="utf-8"))

ret1 = subprocess.Popen(["mkdir","t1"])
ret2 = subprocess.Popen("mkdir t2", shell=True)

 

subprocess.Popen(...)

用于执行复杂的系统命令

参数:

    • args:shell命令,可以是字符串或者序列类型(如:list,元组)
    • bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
    • stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
    • preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
    • close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。
      所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
    • shell:同上
    • cwd:用于设置子进程的当前目录
    • env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
    • universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
    • startupinfo与createionflags只在windows下有效
      将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等 
 
import subprocess

#执行普通命令
ret1 = subprocess.Popen(["mkdir","t1"])
ret2 = subprocess.Popen("mkdir t2", shell=True)

# 执行需要再次输入的命令,例如python3
obj = subprocess.Popen(["python3"],
                       stdin=subprocess.PIPE,
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE,
                       universal_newlines=True)

obj.stdin.write("print(1)\n")
obj.stdin.write("print 2")
obj.stdin.close()

cmd_out = obj.stdout.read()
obj.stdout.close()
cmd_error = obj.stderr.read()
obj.stderr.close()

print(cmd_out)
print(cmd_error)

 

H.zipfile & tarfile

压缩、解压缩,打包、解包操作

import zipfile
import tarfile

# zip
# 压缩
z = zipfile.ZipFile("test.zip", "w")
z.write("test.xml")
z.close()

# 解压
z = zipfile.ZipFile("test.zip", "r")
# 查看zip中所有的文件列表
for i in z.namelist():
    print(i)

z.extract("test.xml")  # 解压单个文件
z.extractall()  # 解压整个zip包
z.close()


# 打包
tar = tarfile.TarFile("test.tar", "w")
tar.add("test.xml")
tar.close()

# 解压
tar = tarfile.TarFile("test.tar", "r")
for i in tar.getmembers():
    print(i)

 

 

转载于:https://www.cnblogs.com/pesen/p/5573938.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值