Python之路--python基础2

最新推荐文章于 2022-07-21 08:43:56 发布

weixin_33747129

最新推荐文章于 2022-07-21 08:43:56 发布

阅读量192

点赞数

文章标签： python 爬虫数据结构与算法

原文链接：https://segmentfault.com/a/1190000016516921

版权

day2内容
1、列表、元组操作
2、字符串操作
3、字典操作
4、集合操作
5、文件操作
6、字符编码与转码
7、内置函数

在Python中，最基本的数据结构是序列(sequence)。序列中的每个元素被分配一个序号--即元素的位置，也称为索引。第一个索引是0，第二个是1，以此类推。Python包含6中内键的序列：列表、元组、字符串、Unicode字符串、buffer对象和xrange对象。
序列都可以进行的操作包括索引、切片、加、乘、检查成员
此外，Python已经内置确定序列的长度以及确定最大和最小元素的方法。

一、列表(list)

列表是Python中最常用的数据类型之一，通过列表可以对数据实现最方便的存储，修改等操作。

1、列表的基本操作

列表的定义

names = ["dabric","devil","Tom"]

列表中元素的访问

通过下标访问列表中的元素，下标从0开始计数

>>> names
['dabric', 'devil', 'Tom']
>>> names[0]
'dabric'
>>> names[1]
'devil'
>>> names[-1]   #还可以从后往前取
'Tom'
>>> names[-2]
'devil'

改变列表：元素赋值

为指定下标的元素赋值

>>> names
['dabric', 'devil', 'Tom']
>>> names[1] = 'Alice'
>>> names
['dabric', 'Alice', 'Tom']

分片(split)

Python通过对序列进行分片，来取序列中的多个元素，分片遵循顾前不顾后的原则

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil']
>>> names[0:3]             #当从列表的首个元素开始切片时，可以省略不写0；相当于names[:3]
['dabric', 'Alice', 'Tom'] 

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil']
>>> names[1:]              #要取从指定下标处开始所有列表中的元素，省略不写后面的下标；
['Alice', 'Tom', 'Beth', 'Cecil']

#从列表后往前切片  
>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil']
>>> names[-3:-1]
['Tom', 'Beth']

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil']
>>> names[:-1]
['dabric', 'Alice', 'Tom', 'Beth']
>>> names[-4:]
['Alice', 'Tom', 'Beth', 'Cecil']

2、列表的方法

append

append方法用于在列表末尾追加新的对象：

>>> names
['dabric', 'Alice', 'Tom']
>>> names.append('Beth')
>>> names
['dabric', 'Alice', 'Tom', 'Beth']

count

count方法统计某个元素在列表中出现的次数：

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil']
>>> names.count('Cecil')
2

extend

extend方法可以在列表的末尾一次性追加另一个序列中的多个值。换句话说，可以用新列表扩展原有的列表：

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil']
>>> name
['Dee-Dee', 'Earl']
>>> names.extend(name)
>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Earl']

index

index方法用于从列表中找出某个值第一个匹配的索引位置：

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Earl']
>>> names.index('dabric')
0
>>> names.index('Tom')
2
>>> names.index('Jerry')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'Jerry' is not in list

insert

insert方法用于将对象插入到列表中的指定位置：

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Earl']
>>> names.insert(3,'贝斯')
>>> names
['dabric', 'Alice', 'Tom', '贝斯', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Earl']

pop

pop方法会移除列表中的一个元素(默认为最后一个)，并且返回该元素的值：

>>> names
['dabric', 'Alice', 'Tom', '贝斯', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Earl']
>>> names.pop()
'Earl'
>>> names
['dabric', 'Alice', 'Tom', '贝斯', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']
>>> names.pop(3)         #移除指定下标的元素，并将该元素的值返回
'贝斯'
>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']

remove

remove方法用于移除列表中某个值的第一个匹配项(remove方法修改了列表却没有返回值，这与pop方法相反)：

>>> names
['dabric', 'Alice', 'Tom', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']
>>> names.remove('Tom')
>>> names
['dabric', 'Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']

reverse

reverse方法将列表中的元素反向存放：

>>> names
['dabric', 'Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']
>>> names.reverse()
>>> names
['Tom', 'Dee-Dee', 'Cecil', 'Cecil', 'Beth', 'Alice', 'dabric']

sort

sort方法用于在原位置对列表尽心排序。在“原位置排序”意味着改变原来的列表，sort方法修改原来的列表，其返回结果为空：

>>> names
['Tom', 'Dee-Dee', 'Cecil', 'Cecil', 'Beth', 'Alice', 'dabric']
>>> name = names.sort()
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom', 'dabric']
>>> name
>>> print(name)
None

copy

列表的赋值有三种方式：

>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']
>>> name = names.copy()    #方法一(浅拷贝)
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']

>>> names.pop()    #移除names中的最后一个元素
'Tom'
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']

>>> name.clear()    #清空列表中的数据
>>> name
[]
>>> name = names[:]     #方法二，names[:]得到的是包含names所有元素的分片，这是一种很有效率的复制整个列表的方法。只是简单的把names赋值给name是没有用的，因为这样做就让names和name都指向同一个列表了。
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', 'Tom']

>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']
>>> name.clear()
>>> name
[]
>>> import copy     #方法三，导入copy模块
>>> name = copy.copy(names)      #其功能和列表的copy方法一样，都是浅拷贝
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']

深拷贝与浅拷贝

#浅拷贝，只能拷贝第一层的列表，对于第二层乃至更深的一层不能实现拷贝的功能。列表的拷贝是将列表中每个索引指向的值拷贝一份，对于列表中的列表，其索引指向的是另外一个地址，而浅拷贝只是将该索引指向的地址复制一份，所以当前一个列表中值发生改变，相应复制的列表中的值也发生同样的变化。
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee']
>>> names.append([1,2,3])
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]
>>> name
[]
>>> name = copy.copy(names)
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]
>>> names[-1][0] = 10
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [10, 2, 3]]
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [10, 2, 3]]

#深拷贝
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]
>>> name
[]
>>> name = copy.deepcopy(names)
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]
>>> names[-1][0] = 10
>>> names
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [10, 2, 3]]
>>> name
['Alice', 'Beth', 'Cecil', 'Cecil', 'Dee-Dee', [1, 2, 3]]

二、元组(tuple)

元组与列表一样，也是一种序列。唯一不同的是元组不能修改。

1、元组的定义

用逗号分隔一些值，就自动创建了元组

>>> 1,2,3
(1, 2, 3)
>>> (1,2,3)
(1, 2, 3)

包含一个值的元组的实现

>>> 42,
(42,)
>>> (42)
42
>>> (42,)
(42,)

2、元组的操作

元组也可以实现切片，它只有两个方法，一个是count，另一个是index。

程序练习

程序：购物车程序

需求:
(1) 启动程序后，让用户输入工资，然后打印商品列表
(2) 允许用户根据商品编号购买商品
(3) 用户选择商品后，检测余额是否够，够就直接扣款，不够就提醒
(4) 可随时退出，退出时，打印已购买商品和余额

三、字符串操作

1、基本的字符串操作

所有标准的序列操作（索引、分片、乘法、判断成员资格、求长度、取最小值和最大值）对字符串同样适用。但是字符串是不可改变的，例如：

>>> website = "http://www.python.org"
>>> website[-3:] = 'com'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

2、字符串格式化

字符串格式化使用字符串格式化操作符即百分号%来实现。在%号左侧放置一个字符串(格式化字符串)，而右侧则放置希望被格式化的值。可以使用一个值，如一个字符串或者数字，也可以使用多个值的元组或者字典。
格式化操作符的右操作数可以是任意类型，如果右操作数是元组的话，则其中的每一个元素都会被单独格式化，每个值都需要一个对应的转换说明符。

>>> '%s plus %s equals %s' % (1,1,2)
'1 plus 1 equals 2'

基本的转换说明符

%字符：标记转换说明符的开始
转换标志(可选)：-表示左对齐；+表示在转换值之前要加上正负号；""(空白字符)表示整十数之前保留空格；0表示转换值若位数不够则用0填充。
最小字段宽度(可选)：转换后的字符串至少应该具有该值指定的宽度。如果是*，则宽度会从值元组中读出。
点(.)后跟精度值(可选)：如果转换的是实数，精度值就表示出现小数点后的位数。如果转换的是字符串，那么该数字就表示最大字段宽度。如果是*，那么精度将会从元组中读出。
转换类型：参见下表。

3、字符串方

基本方法

center

方法center通过在两边添加填充字符(默认为空格)让字符串居中。

>>> name = "My name is Dabric"
>>> name.center(39)
'           My name is Dabric           '
>>> name.center(39,'*')
'***********My name is Dabric***********'


>>> name.ljust(40,'*')   #将字符串靠左显示，如果字符串不够给定长度，用给定字符进行填充
'My name is Dabric***********************'
>>> name.rjust(40,'*')   #将字符串靠右显示，如果字符串不够给定长度，用给定字符进行填充
'***********************My name is Dabric'

>>> name.zfill(40)       #将字符串靠右显示，如果字符串不够给定长度，用0进行填充
'00000000000000000000000My name is Dabric'

find

方法find在字符串中查找子串。如果找到，就返回子串的第一个字符的索引，否则返回-1.

>>> name.find('e')
6
>>> name.find('u')
-1
还可以指定搜索的起点和终点(可选)
>>> subject = '$$$ Get rich now!!! $$$'
>>> subject.find('$$$')
0
>>> subject.find('$$$', 1) # 只指定了起点
20
>>> subject.find('!!!')
16
>>> subject.find('!!!', 0, 16) # 同时指定了起点和终点(搜索范围包含起点，但不包含终点)
-1

join

join是一个非常重要的字符串方法，其作用于split相反，用于合并序列的元素。

>>> seq = [1, 2, 3, 4, 5]
>>> sep = '+'
>>> sep.join(seq) # 尝试合并一个数字列表
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sequence item 0: expected string, int found
>>> seq = ['1', '2', '3', '4', '5']
>>> sep.join(seq) # 合并一个字符串列表
'1+2+3+4+5'
>>> dirs = '', 'usr', 'bin', 'env'
>>> '/'.join(dirs)
'/usr/bin/env'
>>> print('C:' + '\\'.join(dirs))
C:\usr\bin\env

repalce

方法replace将指定子串都替换为另一个字符串，并返回替换后的结果。

>>> "dabric tian, chinese name is tianchenguang".replace("tian","TIAN",1)
     'dabric TIAN, chinese name is tianchenguang'

spilt

split是一个非常重要的字符串方法，其作用与join相反，用于将字符串拆分为序列。

>>> '1+2+3+4+5'.split('+')
['1', '2', '3', '4', '5']
>>> '/usr/bin/env'.split('/')
['', 'usr', 'bin', 'env']
>>> 'Using the default'.split()
['Using', 'the', 'default']
注意，如果没有指定分隔符，将默认在单个或多个连续的空白字符（空格、制表符、换行符
等）处进行拆分。

strip

方法strip将字符串开头和末尾的空白（但不包括中间的空白）删除，并返回删除后的结果。

>>> ' internal whitespace is kept '.strip()
'internal whitespace is kept'

判断字符串是否满足特定的条件

很多字符串方法都以is打头，如isspace、isdigit和isupper，它们判断字符串是否具有特定
的性质（如包含的字符全为空白、数字或大写）。如果字符串具备特定的性质，这些方法就返回
True，否则返回False。

isalnum、isalpha、isdecimal、isdigit、isidentifier、islower、isnumeric、
isprintable、isspace、istitle、isupper

其他方法

>>> name = 'My name is Dabric'
>>> name
'My name is Dabric'
>>> name.capitalize()       #首字母大写
'My name is dabric'

>>> name.casefold()         #大写全部变成小写
'my name is dabric'

>>> name.count('a')         #统计指定字符或字符串出现的次数
2
>>> name.count('is')
1

>>> name.encode()           #将字符串编码成bytes格式
b'My name is Dabric'

>>> name.endswith('dabric')   #判断字符串是否以指定的字符串结尾
False
>>> name.endswith('Dabric')
True

>>> "dabric\ttian".expandtabs(10)   #将\t转换成多长的空格
'dabric    tian'
>>> "dabric\ttian".expandtabs()
'dabric  tian'

format :

    >>> msg = "my name is {}, and age is {}"
    >>> msg.format("dabric",22)
    'my name is dabric, and age is 22'
    >>> msg = "my name is {1}, and age is {0}"
    >>> msg.format("dabric",22)
    'my name is 22, and age is alex'
    >>> msg = "my name is {name}, and age is {age}"
    >>> msg.format(age=22,name="dabric")
    'my name is dabric, and age is 22'
format_map
    >>> msg.format_map({'name':'dabric','age':22})
    'my name is dabric, and age is 22'


msg.index('a')  返回a所在字符串的索引

maketrans
    >>> intab = "aeiou"  #This is the string having actual characters. 
    >>> outtab = "12345" #This is the string having corresponding mapping character
    >>> trantab = str.maketrans(intab, outtab)
    >>> 
    >>> str = "this is string example....wow!!!"
    >>> str.translate(trantab)
    'th3s 3s str3ng 2x1mpl2....w4w!!!'

 msg.partition('is')   输出 ('my name ', 'is', ' {name}, and age is {age}') 

 msg.swapcase 大小写互换

>>> b="ddefdsdff_哈哈" 
>>> b.isidentifier() #检测一段字符串可否被当作标志符，即是否符合变量命名规则
True

四、字典操作

字典的数据结构成为映射，字典是Python中唯一内建的映射类型。字典中的值并没有特殊的顺序，但是都存储在一个热定的键(Key)下。键可以是数字、字符串甚至是元组。

1、字典的创建

字典由多个键及与之对应的值构成的Key-Value对组成。每个键和它的值之间用冒号(:)隔开，项之间用逗号(,)隔开，而整个字典是由一对大括号括起来。空字典(不包括任何项)由两个大括号组成({})。例如：

phonebook = {'Alice':'2341','Beth':'9102','Cecil':'3258'}

dict函数

可以使用dict函数，通过其他映射(比如其他字典)或者(键，值)对的序列建立字典。

>>> items  = [('name','Gumby'),('age',42)]
>>> d = dict(items)
>>> d
{'name': 'Gumby', 'age': 42}

dict函数也可以通过关键字参数来创建字典，如下所示：

>>> d = dict(name='Gumby',age=42)
>>> d
{'name': 'Gumby', 'age': 42}

2、字典的操作

增加

>>> info
{'student1': 'Alice', 'student2': 'Beth', 'student': 'Cecil'}
>>> info["student4"] = 'Dabric'
>>> info
{'student1': 'Alice', 'student2': 'Beth', 'student': 'Cecil', 'student4': 'Dabri
c'}

修改

>>> info
{'student1': 'Alice', 'student2': 'Beth', 'student': 'Cecil', 'student4': 'Dabri
c'}
>>> info['student1'] = 'ALICE'
>>> info
{'student1': 'ALICE', 'student2': 'Beth', 'student': 'Cecil', 'student4': 'Dabri
c'}

删除

标准删除方法

>>> info
{'student1': 'ALICE', 'student2': 'Beth', 'student': 'Cecil', 'student4': 'Dabri
c'}
>>> info.pop('student')                #删除指定键下的值，并将该值返回
'Cecil'
>>> info
{'student1': 'ALICE', 'student2': 'Beth', 'student4': 'Dabric'}

通用删除方法

>>> info
{'student1': 'ALICE', 'student2': 'Beth', 'student4': 'Dabric'}
>>> del info['student1']
>>> info
{'student2': 'Beth', 'student4': 'Dabric'}

随机删除方法

>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il'}
>>> info.popitem()     #随机删除字典中的项，并将对应的键值以元组的形式返回
('student3', 'Cecil')

查找

>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il'}
>>> 'student2' in  info    #成员资格检查，标准用户
True

>>> info.get('student1')   #获取指定键下的值，如果该指定的键值不存在，返回None
'Alice'
>>> info.get('student5')
>>>

>>> info['student1']    #同上，但是当指定的键不存在时报错
'Alice'
>>> info['student5']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'student5'

多级字典嵌套及操作

av_catalog = {
    "欧美":{
        "www.youporn.com": ["很多免费的,世界最大的","质量一般"],
        "www.pornhub.com": ["很多免费的,也很大","质量比yourporn高点"],
        "letmedothistoyou.com": ["多是自拍,高质量图片很多","资源不多,更新慢"],
        "x-art.com":["质量很高,真的很高","全部收费,屌比请绕过"]
    },
    "日韩":{
        "tokyo-hot":["质量怎样不清楚,个人已经不喜欢日韩范了","听说是收费的"]
    },
    "大陆":{
        "1024":["全部免费,真好,好人一生平安","服务器在国外,慢"]
    }
}

av_catalog["大陆"]["1024"][1] += ",可以用爬虫爬下来"
print(av_catalog["大陆"]["1024"])
#ouput 
['全部免费,真好,好人一生平安', '服务器在国外,慢,可以用爬虫爬下来']

其他操作

#values
>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il'}
>>> info.values()
dict_values(['Beth', 'Dabric', 'Alice', 'Cecil'])


#keys
>>> info.keys()
dict_keys(['student2', 'student4', 'student1', 'student3'])

#setdefault  setdefault方法在某种程度上类似于get方法，能够获得与给定键相关量的值，除此之外，setdefault还能在字典中不含有给定键的情况下设定相应的键值。
>>> info.setdefault('student5','Devil')
'Devil'
>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il', 'student5': 'Devil'}
>>> info.setdefault('student2','Devil')
'Beth'

#update   update方法可以利用一个字典项更新另外一个字典
>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il', 'student5': 'Devil'}
>>> b = {1:2,3:4,'student0':'ALICE'}
>>> info.update(b)
>>> info
{'student2': 'Beth', 'student4': 'Dabric', 'student1': 'Alice', 'student3': 'Cec
il', 'student5': 'Devil', 1: 2, 3: 4, 'student0': 'ALICE'}

#items  items方法将字典所有的项返回
>>> info.items()
dict_items([('student2', 'Beth'), ('student4', 'Dabric'), ('student1', 'Alice'),
 ('student3', 'Cecil'), ('student5', 'Devil'), (1, 2), (3, 4), ('student0', 'ALI
CE')])

#通过一个列表生成默认dict,有个没办法解释的坑，少用吧这个
>>> dict.fromkeys([1,2,3],'testd')
{1: 'testd', 2: 'testd', 3: 'testd'}

字典的遍历

#方法1
for key in info:
    print(key,info[key])

#方法2
for k,v in info.items(): #会先把dict转成list,数据大时莫用
    print(k,v)

程序练习

程序: 三级菜单
要求:

打印省、市、县三级菜单
可返回上一级
可随时退出程序

menu = {
    '北京':{
        '海淀':{
            '五道口':{
                'soho':{},
                '网易':{},
                'google':{}
            },
            '中关村':{
                '爱奇艺':{},
                '汽车之家':{},
                'youku':{},
            },
            '上地':{
                '百度':{},
            },
        },
        '昌平':{
            '沙河':{
                '老男孩':{},
                '北航':{},
            },
            '天通苑':{},
            '回龙观':{},
        },
        '朝阳':{},
        '东城':{},
    },
    '上海':{
        '闵行':{
            "人民广场":{
                '炸鸡店':{}
            }
        },
        '闸北':{
            '火车战':{
                '携程':{}
            }
        },
        '浦东':{},
    },
    '山东':{},
}

flag = True

while flag:
    for i1 in menu:
        print(i1)

    choice1 = input("Please choose the province1>>:")
    if choice1 in menu:
        while flag:
            for i2 in menu[choice1]:
                print("\t",i2)
            choice2 = input("Please choose the province2>>:")
            if choice2 in menu[choice1]:
                while flag:
                    for i3 in menu[choice1][choice2]:
                        print("\t\t",i3)
                    choice3 = input("Please choose the province3>>:")
                    if choice3 in menu[choice1][choice2]:
                        while flag:
                            for i4 in menu[choice1][choice2][choice3]:
                                print("\t\t\t",i4)
                            choice4 = input("最后一层，按b返回上层，按q退出>>:")
                            if choice4 == 'b':
                                break
                            elif choice4 == 'q':
                                flag = False
                    if choice3 == 'b':
                        break
                    elif choice3 == 'q':
                        flag = False
            if choice2 == 'b':
                break
            elif choice2 == 'q':
                flag = False
    if choice1 == 'b':
        break
    elif choice1 == 'q':
        flag = False

五、集合操作

集合是一个无序的，不重复的数据组合，它的主要作用如下：

去重，把一个列表变成集合，就自动去重了

list_1 = [1,3,5,7,9,5]
list_1 = set(list_1)

print(list_1,type(list_1))
{1, 3, 5, 7, 9} <class 'set'>

关系测试，测试两组数据之间的交集、并集、差集等关系

1、基本操作

  
t.add('x')            # 添加一项  
  
s.update([10,37,42])  # 在s中添加多项  
  
使用remove()可以删除一项：  
t.remove('H')  
   
len(s)  
set 的长度  
  
x in s  
测试 x 是否是 s 的成员  
  
x not in s  
测试 x 是否不是 s 的成员  
  
s.copy()  
返回 set “s”的一个浅复制

2、关系测试(方法形式和符号形式)

交集

>>>list_1 = [1,3,5,7,9,5]
>>>list_1 = set(list_1)
>>>list_2 = set([6,2,4,7,8,9])
>>>print(list_1,list_2)
{1, 3, 5, 7, 9} {2, 4, 6, 7, 8, 9}
>>>list_3 = list_1.intersection(list_2)   #或者list_3 = list_1 & list_2
>>>print(list_3)
{9, 7}

并集

>>>list_3 = list_1.union(list_2)  #或者list_3 = list_1 | list_2
>>>print(list_3)
{1, 2, 3, 4, 5, 6, 7, 8, 9}

差集

>>>list_3 = list_1.difference(list_2)  #或者list_3 = list_1 - list_2
>>>print(list_3)
{1, 3, 5}

对称差集

>>>list_3 = list_1.symmetric_difference(list_2)    #或者list_3 = list_1 ^ list_2
>>>print(list_3)
{1, 2, 3, 4, 5, 6, 8}

子集

>>>a = list_1.issubset(list_2)  #判断一个集合是否为另外一个集合的子集(list_1是否为list_2的子集)，如果是返回值为真，否则为假
>>>print(a)
False

父集

>>>a = list_1.issuperset(list_2)   #判断一个集合是否为另外一个集合的父集(list_1是否包含list_2)，如果是返回值为真，否则为假
>>>print(a)
False

六、文件操作

对文件操作流程

1、打开文件，得到文件句柄并赋值给一个变量
2、通过句柄对文件进行操作
3、关闭文件

1、打开文件

要打开文件，可使用函数open，它位于自动导入的模块io中。函数open将文件名作为唯一必
不可少的参数，并返回一个文件对象。如果当前目录中有一个名为somefile.txt的文本文件（可能
是使用文本编辑器创建的），则可像下面这样打开它：

>>> f = open('somefile.txt')

文件模式：

r，只读模式(默认)
w，只写模式 (不可读；文件不存在则创建；文件存在则既有内容被删除，并从文件开头处开始写入)
x，独占写入模式(在文件已经存在时引发FileExistsError)
a，追加模式(可读；文件不存在则创建；存在则在文件末尾追加写入内容)

"+"表示可以同时读写某个文件
r+，可读写文件(可读；可写；可追加。以读和追加模式打开文件)
w+，写读(没有文件先创建一个文件，有文件的话覆盖原文件)
a+，追加读写

默认模式为'rt'，这意味着将把文件视为经过编码的Unicode文本，因此将自动执行解码和编码，且默认使用UTF-8编码。默认情况下，行以'n'结尾。读取时讲自动替换其他行尾字符('r'或'rn')；写入时讲'n'替换为系统的默认行尾字符(os.linesep)；通过设置newline参数可指定只将'r'或'rn'视为合法的行尾字符，这样，读取时不会对行尾字符进行转换，但写入时将把'n'替换为系统默认的行尾字符。

"U"表示在读取时，可以将rnrn自动转换成n(与r或r+模式同时使用)
rU
r+U

"b"表示处理二进制文件(如：FTP发送上传ISO镜像文件，linux可忽略，windows处理二进制文件时需标注)
rb
wb
ab

2、文件的基本方法

读取和写入

文件写入

每当调用f.write(string)时，提供的字符串都将写入到文件中既有内容的后面

>>> f = open('somefile.txt', 'w')
>>> f.write('Hello, ')
7
>>> f.write('World!')
6
>>> f.close()

文件读取

>>> f = open('somefile.txt', 'r')
>>> f.read(4)
'Hell'
>>> f.read()
'o, World!'
#首先，指定了要读取多少（4）个字符。接下来，读取了文件中余下的全部内容（不指定要读取多少个字符）。

读取和写入行

读取行

readline
默认返回一行的内容，也可以指定返回几个字符
readlines
读取文件中的所有行，并以列表的方式返回

写入行

writelines
与readlines相反：接受一个字符串列表(实际上，可以是任何序列或可迭代对象)，并将这些字符串都写入到文件(或流)中。
注意：写入时不会添加换行符，因此必须自行添加。另外，没有方法writeline，一次可以使用write

with语句

为了避免打开文件后忘记关闭，可以通过上下文管理器，即：

with open("somefile.txt") as somefile
    do_something(somefile)

3、其他方法

flush

将内存中的数据同步到磁盘上

进度条的实现
import sys,time

for i in range(50):
    sys.stdout.write('#')
    sys.stdout.flush()
    time.sleep(0.1)

tell和seek
tell方法返回文件指针的位置，seek方法用来移动文件指针的位置，一般两个方法结合使用。

encoding
打印文件的编码

fileno
返回文件句柄的编号。操作系统会有一个专门的接口负责调度所有的文件，该编号为系统中已经打开文件的编号。

name
打印文件名

truncate
该方法如果不接任何参数，则将文件中的内容清空；如果接一个参数10，从文件指针位置开始截断取出十个字符并保存文件指针前的内容。

4、迭代文件内容

一种常见的文件操作是迭代其内容，并在迭代过程中反复采取某种措施。

每次一个字符(或字节)

使用read遍历字符

with open('yesterday',encoding='utf-8') as f:
     char = f.read(1)
     while char:
         sys.stdout.write(char)
         char = f.read(1)

with open('yesterday', encoding='utf-8') as f:
    while True:
        char = f.read(1)
        if not char: break
        sys.stdout.write(char)

这个程序之所以可行，是因为到达文件末尾时，方法read将返回一个空字符串，但在此之前，返回的字符串都只包含一个字符(对应于布尔值True)。

每次一行

with open (filename) as f:
    while True:
        line = f.readline()
        if not line: break
        print(line)

读取文件所有内容

如果文件不太大，可以读取整个文件，利用下面两种方法进行读取

with open(filename) as f:
    for char in f.read():
        sys.stdout.write(char)

with open(filename) as f:
    for line in f.readlines:
        print(line)

两者的区别是read不提供任何参数，将整个文件读取到一个字符串中；readlines将文件读取到一个字符串列表中，其中每个字符串都是一行。

使用fileinput实现延迟行迭代

延迟行迭代的方法：延迟是因为它只读取实际需要的文本部分

import fileinput
for line in fileinput.input(filename)
    print(line)

# 模块fileinput会负责打开文件，只需要给其提供一个文件名即可。

文件迭代器(最常见的方法)

文件实际上可迭代的(sys.stdin也是可迭代的)，所以可以用下面的方法进行读取文件的内容：

with open(filename) as f:
    for line in f:
        print(line)

六、字符编码与转码

1、在python2默认编码是ASCII, python3里默认是unicode

2、unicode 分为 utf-32(占4个字节),utf-16(占两个字节)，utf-8(占1-4个字节)， so utf-16就是现在最常用的unicode版本，不过在文件里存的还是utf-8，因为utf8省空间

3、在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

转码过程如下图：

in python2

#-*-coding:utf-8-*-

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")

print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)

in python3

#-*-coding:gb2312 -*-   #这个也可以去掉

import sys
print(sys.getdefaultencoding())


msg = "我爱北京天安门"
#msg_gb2312 = msg.decode("utf-8").encode("gb2312")
msg_gb2312 = msg.encode("gb2312") #默认就是unicode,不用再decode,喜大普奔
gb2312_to_unicode = msg_gb2312.decode("gb2312")
gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")

print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)
print(gb2312_to_utf8)