python学习手册:学习笔记2--Introducing Python Object Types

最新推荐文章于 2024-07-13 21:34:41 发布

weixin_34234721

最新推荐文章于 2024-07-13 21:34:41 发布

阅读量96

点赞数

文章标签： python 数据结构与算法 c/c++

原文链接：https://my.oschina.net/voler/blog/377468

版权

为什么80%的码农都做不了架构师？>>>

1. Why Use Built-in Types?

内建类型对象使程序易于编写

对于简单的任务来说,内建类型可代表其数据结构.例如使用lists来表示列表,用dictionaries表示查询表格等.

内建类型对象可用于扩展

复杂的数据结构可以使用内建类型对象来进行扩展.比如堆栈可以使用list进行扩展.

内建类型对象比传统的对象更加有效

因为python的内建类型对象是使用C来实现的.

内建类型对象是python标准的一部分

不像STL一样作为C++的扩展库,也不想LISP一样完全是内建类型,python的内建类型对象则是python语言本身的一部分.

总体来说,内建类型对象不仅仅让编程更加的容易,而且更加的强大和有效.

2. Python's Core Data Types

以下表格是python的数据类型:

Object type	例子
Numbers	1234,3.456
String	'spam',"hello world"
Lists	[1, [2, 'three'], 4]
Dictionaries	{'food' : 'spam', 'taste' : 'yum'}
Tuples	(1, 'spam', 4, 'U')
Files	myfile = open('eggs', 'r')
Sets	set('abc'),{'a', 'b', 'c'}
Other core types	Booleans, types, None
Program unit types	Functions, modules, classes
Implementation-related types	Compiled code, stack tracebacks

3. Numbers

数值包括整型,浮点型和其他复合类型(如复数类型).

>>> 123 + 222
345
>>> 1.5 * 4
6.0
>>> 2 ** 100
1267650600228229401496703205376
>>> len(str(2 ** 1000000))
301030

最大整数值可达到2的1000000次方.

有两种显示数值的方式,一种是repr:全精度显示.一种是str:用户友好的方式显示.以下例子为书上的,但是在我的python3.4版本中,它们显示的却是一样:

>>> 3.1415 * 2          #repr:as code
6.28300000000000004
>>> print(3.1415 * 2)   #str:use-friendly
6.283

在某些情况下,使用模块math很是方便

>>> import math
>>> math.pi
3.141592653589793
>>> math.sqrt(85)
9.219544457292887

而使用random模块可以操作随机数的一些基本操作

>>> import random
>>> random.random()
0.46429998196669087
>>> random.choice([1, 2, 3, 4])
1
>>> random.choice([1, 2, 3, 4])
2

4. Strings

序列是操作其从左到右排列的项的一种数据结构:它们的项是基于其索引的相对位置.

序列操作

序列是通过其索引来读取项的:

>>> S = 'Spam'
>>> len(S)
4
>>> S[0]
'S'

而我们也可以使用负数作为索引,则-1代表最后一个字符,-2代表倒数第二个,依此类推:

>>> S[-1]
'm'
>>> S[-2]
'a'

切片是有效读取序列范围元素的一个操作,基本语法:[Start:End:Step],代表从索引Start开始,到End的前一个项结束,其步长为Step.切片操作是新建一个对象,而非改变原对象:

>>> S = "hello world"
>>> S[1:]
'ello world'
>>> S[:5]
'hello'
>>> S[1:5]
'ello'
>>> S[::2]
'hlowrd'
>>> S[:-1]
'hello worl'

我们可以使用'+'来连接字符串,通过'*'来重复字符串:

>>> S = 'Spam'
>>> S + 'xyz'
'Spamxyz'
>>> S * 3
'SpamSpamSpam'

不可改变性

字符串本身是不可改变的,我们通常的操作都是新建一个字符串:

>>> S
'Spam'
>>> id(S)
60272344
>>> S[0] = 'z'
Traceback (most recent call last):
  File "<pyshell#27>", line 1, in <module>
    S[0] = 'z'
TypeError: 'str' object does not support item assignment
>>> S = 'z' + S[1:]
>>> id(S)
60272008

通过赋值操作,字符串S本身已经被改变(id已经变化了).

字符串特有方法

相对于其他的序列,如列表,元组,字符串有自己本身特殊的方法.

find方法用于查找字符串,返回其子字符串的索引(-1代表没有找到),而replace方法用于替换子字符串的操作,它返回一个新建的字符串:

>>> S.find("pa")
1
>>> S
'zpam'
>>> S.replace('pa', 'XYZ')
'zXYZm'

而我们可以通过split来分割字符串成列表,通过一些基本的方法(upper,isalpha)来判断字符串:

>>> line = 'aaa,bbb,ccccc,dd'
>>> line.split(',')
['aaa', 'bbb', 'ccccc', 'dd']
>>> S = 'spam'
>>> S.upper()
'SPAM'
>>> S.isalpha()
True
>>> line = 'aaa,bbb,ccccc,dd\n'
>>> line = line.rstrip()
>>> line
'aaa,bbb,ccccc,dd'

通过%来格式化字符串,而在2.6和3.0中可以通过format来格式化:

>>> '%s, eggs, and %s' % ('spam', 'SPAM!')
'spam, eggs, and SPAM!'
>>> '{0}, eggs, and {1}'.format('spam', 'SPAM!')
'spam, eggs, and SPAM!'

得到帮助

我们可以使用dir函数:返回一个列表,包含此对象的所有属性.

>>> dir(S)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', 
'__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', 
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', 
'__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 
'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

而help函数非常有用

>>> help(S.replace)
Help on built-in function replace:

replace(...) method of builtins.str instance
    S.replace(old, new[, count]) -> str
    
    Return a copy of S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.

模式匹配

通过模块re来完成:

>>> import re
>>> math = re.match('hello[ \t]*(.*)world', 'hello    Python world')
>>> math.group(1)
'Python '
>>> math = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumerjack')
>>> math.groups()
('usr', 'home', 'lumerjack')

5. Lists

列表是可存储任何数据类型,且长度不固定的可修改元素的序列.

序列操作

适用于String的一切操作,通常也适用于列表:

>>> L = [123, 'spam', 1.23]
>>> len(L)
3
>>> L[0]
123
>>> L + [4, 5, 6]
[123, 'spam', 1.23, 4, 5, 6]

列表特有的操作

列表类似于其他语言的数组,但是列表的元素可以为任何的数据类型(数值,字符串,列表,元组等).而且列表没有固定的长度,我们可以增加/修改列表:

>>> L
[123, 'spam', 1.23]
>>> L.append('NI')
>>> L
[123, 'spam', 1.23, 'NI']
>>> L.pop(2)
1.23
>>> L
[123, 'spam', 'NI']

由于列表是可修改的,所以一些方法会直接修改原列表:

>>> M = ['bb', 'aa', 'cc']
>>> M.sort()
>>> M
['aa', 'bb', 'cc']
>>> M.reverse()
>>> M
['cc', 'bb', 'aa']

边界检查

列表也是不允许越界的:

>>> L
[123, 'spam', 'NI']
>>> L[99]
Traceback (most recent call last):
  File "<pyshell#66>", line 1, in <module>
    L[99]
IndexError: list index out of range
>>> L[99] = 'what'
Traceback (most recent call last):
  File "<pyshell#67>", line 1, in <module>
    L[99] = 'what'
IndexError: list assignment index out of range

推导式

推导式不仅仅适用于列表,它也适用于集合和字典:

>>> [ord(x) for x in 'spaam']
[115, 112, 97, 97, 109]
>>> {ord(x) for x in 'spaam'}
{112, 97, 115, 109}
>>> {x: ord(x) for x in 'spaam'}
{'s': 115, 'p': 112, 'a': 97, 'm': 109}

6. Dictionaries

字典是一个关联式容器,内部使用key来排序.

关联操作:

我们可以通过索引来读取字典的值.但其索引是key,而非数字索引:

>>> D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}
>>> D['food']
'Spam'
>>> D['quantity'] += 1
>>> D
{'quantity': 5, 'food': 'Spam', 'color': 'pink'}

但是不同于列表会产生越界,字典的新增会直接创建值:

>>> D = {}
>>> D['name'] = 'Bob'
>>> D['job'] = 'dev'
>>> D
{'name': 'Bob', 'job': 'dev'}

我们可以使用dict内建函数新增字典,可以传递关键字参数(name=value),或者传递zip函数后的参数:

>>> bob1 = dict(name='Bob', job='dev', age=40)
>>> bob1
{'age': 40, 'name': 'Bob', 'job': 'dev'}
>>> bob2 = dict(zip(['name', 'job', 'age'],['Bob', 'dev', 40]))
>>> bob2
{'age': 40, 'name': 'Bob', 'job': 'dev'}

测试键是否存在

我们可以使用in来判断key是否存在:

>>> D
{'name': 'Bob', 'job': 'dev'}
>>> "name" in D
True
>>> "error" in D
False

我们可以使用get函数来读取,并且可以赋默认值:

>>> D.get("error")
>>> D.get("error", 1)
1

排序键值:为了循环

对字典的操作,实际上是对键值的操作,所以我们可以对字典进行排序,实际上就是对键值进行排序:

>>> D = {"a": 1, "b": 2, "c": 3}
>>> D
{'a': 1, 'c': 3, 'b': 2}
>>> for key in sorted(D):
	print(key, "=>", D[key])

	
a => 1
b => 2
c => 3

迭代和优化

一个对象在以下情况下是可迭代的:它在内存中是物理存储的,或者对象的每一项是在可迭代操作的上下文中产生的:一个虚拟的序列.

通常可以使用iter函数来产生可迭代对象,并且使用next函数来访问.

>>> A = [1, 2, 3, 4]
>>> i = iter(A)
>>> i.__next__()
1
>>> i.__next__()
2
>>> with open("test.txt") as fobj:
	for line in fobj:
		print(line)

		
hello world
i love this world
and i love python too

这里,文件对象就是可迭代的:文件对象并不是一个列表,它是按需读取的,是在可迭代的情况下一行一行的读取.

而列表推导式则类似文件一样,是具有迭代作用的:

>>> squares = [x ** 2 for x in [1, 2, 3, 4, 5]]
>>> squares
[1, 4, 9, 16, 25]

因为列表推导式可内部实现为:

>>> i = iter([1, 2, 3, 4, 5])
>>> squares = []
>>> while True:
	try:
		value = i.__next__()
	except StopIteration:
		break
	squares.append(value ** 2)

	
>>> squares
[1, 4, 9, 16, 25]

7. Tuples

元组可以理解为:不可改变的列表:

>>> T = (1, 2, 3, 4)
>>> len(T)
4
>>> id(T)
53984696
>>> T += (5, 6)
>>> T
(1, 2, 3, 4, 5, 6)
>>> id(T)
53974888

而元组的index方法是返回参数的索引,count方法则计算元素出现的次数:

>>> T = (1, 2, 3, 4, 5, 4, 3, 2)
>>> T.index(4)
3
>>> T.count(4)
2

为什么已经有列表情况下还要元组呢?想想C/C++语言里面非常重要的修饰符const,用于保证传入函数的参数不可被修改.而对于python来说,如果要保证传入的参数不能被修改,则传入元组即可,在函数内部再通过list内建函数转换为列表进行操作.

8. Files

普通文件:

我们可以使用文件对象来读取/写入文件:

>>> f = open("test.txt")
>>> text = f.read()
>>> text
'hello world\ni love this world\nand i love python too\n\n'
>>> with open("test.txt") as f1:
	with open("output.txt", "w") as f2:
		for line in f1:
			f2.write(line)

	
12
18
21
>>> f = open("output.txt")
>>> f.read()
'hello world\ni love this world\nand i love python too'

write函数会显示所写入的字符个数.

二进制文件

我们可以通过模块struct来操作二进制文件:

>>> import struct
>>> packed = struct.pack('>i4sh', 7, b'spam', 8)
>>> packed
b'\x00\x00\x00\x07spam\x00\x08'
>>> file = open("data.bin", "wb")
>>> file.write(packed)
10
>>> file.close()
>>> data = open("data.bin", "rb").read()
>>> data
b'\x00\x00\x00\x07spam\x00\x08'
>>> data[4:8]
b'spam'
>>> list(data)
[0, 0, 0, 7, 115, 112, 97, 109, 0, 8]
>>> type(data)
<class 'bytes'>
>>> struct.unpack('>i4sh', data)
(7, b'spam', 8)

转载于:https://my.oschina.net/voler/blog/377468