python数据分析01-CSDN博客

本文链接：https://blog.csdn.net/m0_58044245/article/details/128740936

本文介绍了Python在数据分析中的重要库Numpy和Pandas，强调了内省机制如使用`?`和`??`来查看对象信息。还讨论了迭代协议、元组的创建和操作，以及切片在序列中的应用。此外，文章提到了enumerate函数在遍历序列时跟踪索引的便捷性，以及字典作为哈希映射的数据结构及其操作，如`update`方法。

摘要由CSDN通过智能技术生成

参考书籍：

GitHub - iamseancheney/python_for_data_analysis_2nd_chinese_version: 《利用Python进行数据分析·第2版》

重要的两个Python库学习

Numpy
pandas

IPython默认采用序号的格式In [2]:，与标准的>>>提示符不同。

自省

在变量前后使用问号？，可以显示对象的信息。使用??会显示函数的源码：

In [8]: b = [1, 2, 3]

In [9]: b?
Type:       list
String Form:[1, 2, 3]
Length:     3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items

In [10]: print?
Docstring:
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type:      builtin_function_or_method

你可以用%run命令运行所有的Python程序。

判断一个对象是否遵循迭代协议，判断是否是可迭代的使用iter函数

def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

返回字符串以及大多数Python集合类型为True

我总是用这个功能编写可以接受多种输入类型的函数。常见的例子是编写一个函数可以接受任意类型的序列（list、tuple、ndarray）或是迭代器。你可先检验对象是否是列表（或是NUmPy数组），如果不是的话，将其转变成列表：

if not isinstance(x, list) and isiterable(x):
    x = list(x)

元组是一个固定长度，不可改变的Python序列对象。

创建元组的最简单方式:

 tup = 4, 5, 6

用tuple可以将任意序列或迭代器转换成元组：

In [5]: tuple([4, 0, 2])
Out[5]: (4, 0, 2)

In [6]: tup = tuple('string')

In [7]: tup
Out[7]: ('s', 't', 'r', 'i', 'n', 'g')

切片

用切边可以选取大多数序列类型的一部分，切片的基本形式是在方括号中使用start:stop：

In [73]: seq = [7, 2, 3, 7, 5, 6, 0, 1]

In [74]: seq[1:5]
Out[74]: [2, 3, 7, 5]

切片也可以被序列赋值：

In [73]: seq = [7, 2, 3, 7, 5, 6, 0, 1]

In [74]: seq[1:5]
Out[74]: [2, 3, 7, 5]

负数表明从后向前切片：

In [79]: seq[-4:]
Out[79]: [5, 6, 0, 1]

In [80]: seq[-6:-2]
Out[80]: [6, 3, 5, 6]

在第二个冒号后面使用step，可以隔一个取一个元素：

In [81]: seq[::2]
Out[81]: [7, 3, 3, 6, 1]

（step使用-1,可以将列表或元组跌倒过来）

In [82]: seq[::-1]
Out[82]: [1, 0, 6, 5, 3, 6, 3, 2, 7]

Python好用的一些序列函数

enumerate函数

迭代一个序列时，你可能想跟踪当前项的序号。手动的方法可能是下面这样：

i = 0
for value in collection:
   # do something with value
   i += 1

因为这么做很常见，Python内建了一个enumerate函数，可以返回(i, value)元组序列：

for i, value in enumerate(collection):
   # do something with value

当你索引数据时，使用enumerate的一个好方法是计算序列（唯一的）dict映射到位置的值：

In [83]: some_list = ['foo', 'bar', 'baz']

In [84]: mapping = {}

In [85]: for i, v in enumerate(some_list):
   ....:     mapping[v] = i

In [86]: mapping
Out[86]: {'bar': 1, 'baz': 2, 'foo': 0}

字典

字典可能是Python最为重要的数据结构。它更为常见的名字是哈希映射或关联数组。它是键值对的大小可变集合，键和值都是Python对象。创建字典的方法之一是使用尖括号，用冒号分隔键和值：

In [101]: empty_dict = {}

In [102]: d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}

In [103]: d1
Out[103]: {'a': 'some value', 'b': [1, 2, 3, 4]}

用update方法可以将一个字典与另一个融合：

In [119]: d1.update({'b' : 'foo', 'c' : 12})

In [120]: d1
Out[120]: {'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}