《利用Python进行数据分析》笔记--Python基础知识

最新推荐文章于 2022-04-19 20:36:07 发布

chnn gege

最新推荐文章于 2022-04-19 20:36:07 发布

阅读量372

点赞数

分类专栏： Python 文章标签： Python

本文链接：https://blog.csdn.net/Eli_Young/article/details/100999808

版权

Python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

阅读SeanCheney博主上传的文章有感，特此写一篇笔记，如侵必删。

第二章 Python语法基础，IPython和Jupyter Notebooks

1. 字典的一种循环初始化方法

data = {i : np.random.randn() for i in range(7)}

Out[7]: 
{0: -0.20470765948471295,
 1: 0.47894333805754824,
 2: -0.5194387150567381,
 3: -0.55573030434749,
 4: 1.9657805725027142,
 5: 1.3934058329729904,
6: 0.09290787674371767}

2. Jupyter Notebooks

可以利用Tab进行补全，比如常见的函数append等，或者导入模块
?：在变量前后使用问号?，可以显示对象的信息。同样可以在函数后面加入?，显示函数的输入信息。
??：显示函数的源码
%run ipython_script_test.py运行该文件，其中文件中的定义的全局变量、函数都可以在命令行中进行访问。
%load ipython_script_test.py将脚本导入到命令行中
%paste和%cpaste函数粘贴刚才复制的代码，使用%cpaste命令可以进行确认，防止粘贴错误
魔术命令(指令前添加%)
- %timeit np.dot(a, a) 查看命令运行的时间
- %pwd 显示当前路径(foo = %pwd foo同样也可以)
- %matplotlib inline

3. Python语法基础

Python不需要分号结尾，使用4格空格缩进
b = a 是进行复制，两个变量一个地址，即同一个对象的双重引用
作为函数形参时，传递的是引用，因此函数中对变量的操作，会影响变量的真实值
print(‘a is {0}, b is {1}’.format(type(a), type(b))) 用type函数来输出变量的类型
isinstance(a, int) 来检查变量的类型 -> True；同样可以用类型元组来检查类型是否属于该元组：isinstance(a, (int, float))
Iterable来判断是否属于迭代对象(list、tuple、dict、set、str)，isinstance() 函数来判断一个对象是否是一个已知的类型

if not isinstance(x, list) and isiterable(x):
x = list(x)

import引用，从同目录下的引用另一个文件：

import some_module as sm
from some_module import PI as pi, g as gf

判断是否指向同一个对象，用a is b和a is not b。与==表达的含义不同。

In [35]: a = [1, 2, 3]

In [36]: b = a

In [37]: c = list(a)

In [38]: a is b
Out[38]: True

In [39]: a is not c
Out[39]: True

In [40]: a == c
Out[40]: True

列表、字典、NumPy数组是可变的对象，字符串和元组是不可变的。
/带有小数，//是整除
字符串可以使用单引号，可以使用双引号
许多Python对象可以使用str函数转化为字符串 s = str(a)
字符串可以变成列表

In [64]: s = 'python'

In [65]: list(s)
Out[65]: ['p', 'y', 't', 'h', 'o', 'n']

In [66]: s[:3]
Out[66]: 'pyt'

s = r’this\has\no\special\characters’表示字符串自身，不表示转义字符
format字符串格式化

In [74]: template = '{0:.2f} {1:s} are worth US${2:d}'
In [75]: template.format(4.5560, 'Argentine Pesos', 1)
Out[75]: '4.56 Argentine Pesos are worth US$1'

str、bool、int和float也是函数，可以用来转换类型b=int(a)
None形参默认参数

def add_and_maybe_multiply(a, b, c=None):
    result = a + b

    if c is not None:
	result = result * c

    return result

pass函数可以作为if等条件语句无代码块处的占位符
三元表达式

In [126]: x = 5

In [127]: 'Non-negative' if x >= 0 else 'Negative'
Out[127]: 'Non-negative'

第三章 Python的数据结构、函数和文件

数据结构和序列

1. 元组

定义元组：

tup = 4, 5, 6
nested_tup = (4, 5, 6), (7, 8)

tuple函数可以将任一序列转换为元组

tuple([4, 0, 2])
tup = tuple(['foo', [1, 2], True])

tup[0] 用方括号访问元组，tup[1].append(3)往元组中的，列表添加元素
可加 (4, None, 'foo') + (6, 0) + ('bar',)，可乘 (‘foo’, ‘bar’) * 4：相当于复制4份
拆分元组：利用这条性质，可以很方便地进行数据交换，例如b, a = a, b

tup = (4, 5, 6) 
a, b, c = tup

tup = 4, 5, (6, 7)
a, b, (c, d) = tup

变量拆分同样可以用于迭代列表中的元组，这个技能满分啊，可以一次同时进行多个变量的取值：

In [27]: seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [28]: for a, b, c in seq:
   ....:     print('a={0}, b={1}, c={2}'.format(a, b, c))
a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9

还可以利用*rest来抓取任意长度的列表

In [29]: values = 1, 2, 3, 4, 5

In [30]: a, b, *rest = values

In [31]: a, b
Out[31]: (1, 2)

In [32]: rest
Out[32]: [3, 4, 5]

count函数统计元组中某个元素出现的次数

a = (1, 2, 2, 2, 3, 4, 2) 
a.count(2)
4

2. 列表

定义列表：列表和元组都可以通过对应的函数来进行强制转换

a_list = [2, 3, 7, None]

tup = ('foo', 'bar', 'baz')
b_list = list(tup)

range(0, 10)

添加或删除
- append函数 b_list.append('dwarf')
- insert函数 b_list.insert(1, 'red')
- pop函数 b_list.pop(2)
- remove函数 b_list.remove('foo')
串联组合
- 相加串联 [4, None, 'foo'] + [7, 8, (2, 3)]
- extend追加，要比加法串联速度快。x.extend([7, 8, (2, 3)])追加了一个元组
排序
a.sort()默认从小到大。b.sort(key=len)按照长度对字符串进行排序
维护已经排序的列表
bisect模块支持二分查找，和向已排序的列表插入值。bisect.bisect可以找到插入值后仍保证排序的位置，bisect.insort是向这个位置插入值：

In [67]: import bisect

In [68]: c = [1, 2, 2, 2, 3, 4, 7]

In [69]: bisect.bisect(c, 2)
Out[69]: 4

In [70]: bisect.bisect(c, 5)
Out[70]: 6

In [71]: bisect.insort(c, 6)

In [72]: c
Out[72]: [1, 2, 2, 2, 3, 4, 6, 7]

切片

seq[3:4] = [6, 3]
seq[3:]
seq[::2] #每隔一个元素取一个

enumerate函数，用来进行迭代时，跟踪当前项的序号。

In [83]: some_list = ['foo', 'bar', 'baz']

In [84]: mapping = {}

In [85]: for i, v in enumerate(some_list):
   ....:     mapping[v] = i

In [86]: mapping
Out[86]: {'bar': 1, 'baz': 2, 'foo': 0}

zip函数，用来成对组合，组合成一个元组列表。想到可以用来将自变量和因变量组合成一对空间坐标点。

In [89]: seq1 = ['foo', 'bar', 'baz']

In [90]: seq2 = ['one', 'two', 'three']

In [91]: zipped = zip(seq1, seq2)

In [92]: list(zipped)
Out[92]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

同样，给出一个”被压缩的“序列，zip函数可以用来解压。

In [96]: pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
   ....:             ('Schilling', 'Curt')]

In [97]: first_names, last_names = zip(*pitchers)

In [98]: first_names
Out[98]: ('Nolan', 'Roger', 'Schilling')

In [99]: last_names
Out[99]: ('Ryan', 'Clemens', 'Curt')

3. 字典

定义字典

d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1[7] = 'an integer' #访问都是用方括号，元组，列表，字典
d1[5] = 'some value' #可以直接利用索引添加字典元素，但是列表和元组不可以，因为会提示超出索引范围

mapping = dict(zip(range(5), reversed(range(5))))
mapping
{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

删除
- del del d1[5]
- pop ret = d1.pop('dummy')这里，ret会得到key对应的value
抓换为列表
list(d1.keys())
融合/组合
d1.update({'b' : 'foo', 'c' : 12})
setdefault函数
可以对列表进行分类，比如按照列表元素的第一个字母

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

实现的功能类似于下面的代码：

In [123]: words = ['apple', 'bat', 'bar', 'atom', 'book']

In [124]: by_letter = {}

In [125]: for word in words:
   .....:     letter = word[0]
   .....:     if letter not in by_letter:
   .....:         by_letter[letter] = [word]
   .....:     else:
   .....:         by_letter[letter].append(word)
   .....:

In [126]: by_letter
Out[126]: {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

键值是不可变的标量类型，比如整数、浮点数、字符串，但是不能用列表。如果要用列表当做键值，必须转化为元组

In [130]: d = {}

In [131]: d[tuple([1, 2, 3])] = 5

In [132]: d
Out[132]: {(1, 2, 3): 5}

4. 集合

定义集合

set([2, 2, 2, 1, 3, 3])
{1, 2, 3}

{2, 2, 2, 1, 3, 3}
{1, 2, 3} #集合中的元素不能重复

合并、交集
- a.union(b) 和 a | b
- a.intersection(b) 和 a & b

子集、父集

In [150]: a_set = {1, 2, 3, 4, 5}

In [151]: {1, 2, 3}.issubset(a_set)
Out[151]: True

In [152]: a_set.issuperset({1, 2, 3})
Out[152]: True

相等 {1, 2, 3} == {3, 2, 1} true集合内容相等，集合才对等

5. 推导式-列表、集合、字典

列表 [x.upper() for x in strings if len(x) > 2]
字典 loc_mapping = {val : index for index, val in enumerate(strings)}
集合 set_comp = {len(x) for x in strings}
注意，推导式中没有元组，因为后面有一个生成器需要用到此格式

6. 嵌套

[x for tup in some_tuples for x in tup]

函数

1. 形参

my_function(x=5, y=6, z=7)关键字参数必须位于位置参数之后，但是用关键字传递参数可以随意顺序。

2. 全局变量

在函数中操作全局变量，必须生命global关键字

In [168]: a = None

In [169]: def bind_a_variable():
   .....:     global a
   .....:     a = []
   .....: bind_a_variable()
   .....:

In [170]: print(a)

3. 返回值 - 字典 & 元组

return a, b, c #返回元组
a, b, c = f() #利用了拆分特性
return_value = f()

return {'a' : a, 'b' : b, 'c' : c}

4. 函数对象思想–函数列表

def remove_punctuation(value):
    return re.sub('[!#?]', '', value)
 #去除头尾的空格字符、去掉特殊符号、首字母大写
clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
	for function in ops:
	    value = function(value)
	result.append(value)
    return result

In [175]: clean_strings(states, clean_ops)
Out[175]: 
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

5. lambda函数

f = lambda x, y, z :x+y+z
print f(1,2,3) #6 

L = [lambda x: x+2, lambda x: x*2, lambda x: x**2]
print "L=", L[0](1), L[1](2), L[2](3)

6. 柯里化

def add_numbers(x, y):
    return x + y

add_five = lambda y: add_numbers(5, y) #实际上就是嵌套了一个lambda函数

7. 生成器 – 延迟返回值序列

gen = (x ** 2 for x in range(100)) #类似于推导式，但是元组的形式是生成器，注意区分

for x in gen: #只有调用该对象，才开始执行上面的代码
   print(x, end=' ')

8. 错误和异常处理

float('something')是错误的，此时可以加一个try

def attempt_float(x):
    try:
	return float(x)
    except:
	return x

如果根据错误提示知道了错误类型，可以直接写在except中：

def attempt_float(x):
    try:
	return float(x)
    except ValueError:
	return x

其中，异常可以使用元组：except (TypeError, ValueError):
有时，在抛出异常后，想要执行另一段代码，使用finally函数：

f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

文件和操作系统

1. 打开读写

path = 'examples/segismundo.txt'
f = open(path)
for line in f:
    pass

 #同样可以利用推导式
lines = [x.rstrip() for x in open(path)]

f.close()

2.可以使用with语句，更容易清理打开的文件

with open(path) as f:
    lines = [x.rstrip() for x in f]

chnn gege

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录