这里写自定义目录标题
1 基本性质
list | tuple | set | array | series | dict | |
---|---|---|---|---|---|---|
sign | [ ] | ( ) | { } | ? | ? | { k i : v i k_i:v_i ki:vi } |
order | \sqrt{} | ? | × \times × | \sqrt{} | \sqrt{} | × \times × |
mutable | \sqrt{} | × \times × | \sqrt{} | \sqrt{} | \sqrt{} | \sqrt{} |
duplicate | \sqrt{} | \sqrt{} | × \times × | \sqrt{} | \sqrt{} | × \times × |
indexed | \sqrt{} | \sqrt{} | × \times × | \sqrt{} | \sqrt{} | \sqrt{} |
slicing | \sqrt{} | \sqrt{} | × \times × | \sqrt{} | \sqrt{} | ? |
转换 | list(X) | tuple(X) | set(X) | np.array(X) | pd.Series(X) | dict(X) |
加元素 | x.append(a) | × \times × | x.add(a) | x.np.append(a) | x=x.append(Series([a])) | x(newK)=newV |
删元素 | del x[index] or x.remove(value) | x.discard(value) | np.delete(x,1,axis = 1) 删除x的第二列 | x.drop(indexName) | del(x[‘keyDel’]) |
2 示例
2.1 List
x_lst = [1,2,4,6]
y_lst = [1,3,6,9] #[]
### 增加元素 ###
x_lst.append('ABC') # x_lst = [1,2,4,6,8,10,12,'ABC']
x_lst.insert(1,'ABC') # x_lst = [1,'ABC',2,4,6,8,10,12]
### 删除元素 ###
del x_lst[1] # 删去指定index--第二个元素 [1,2,4,6,8,10,12] --> [1,4,6,8,10,12] 改变x无返回
x_lst.remove(15) # 按值删除,删去第一个出现的 15,改变x无返回
Xx = x_lst.pop() # 随机从x_lst 中删除一个元素,并将这个元素赋值给 Xx
### list间的加乘(无减除) ###
x_concat_y = x_lst + y_lst # x_concat_y =[1,2,4,6,1,3,6,9]
x_double = x_lst * 2 # x_double = [1,2,4,6,1,2,4,6]
### 其他运算 ###
x_lst.sort # 无返回值,改变 x,从小到大正向排序
x_lst.sort(reverse = True) # 无返回值,改变 x,从大到小倒着排序
sortedX = sorted(x_lst) # 返回排序后的list,不改变 x,从小到大正向排序
sortedX = sorted(x_lst,reverse=True) # 返回排序后的list,不改变 x,从大到小倒着排序
x_lst.reverse # 把 x_lst 内部顺序颠倒
print(min(x_lst),max(x_lst),sum(x_lst)) # 最大、最小、求和
### 复制List
copied_lst = lst[:]
# 如果 copied_lst = lst,则更改 copied_lst 时,lst也会发生相应变动
### 值和索引 ###
dogs = ['border collie', 'australian cattle dog', 'labrador retriever']
for index, dog in enumerate(dogs):
print(index, dog) #调用list中值 和对应坐标
2.2 Tuple
x_tpl = (1,2,4,6)
y_tpl = (1,3,6,9)
### tuple is immutable,无法删减元素 ###
### tuple 加减乘除 ###
x_double = x_tpl * 2 # x_double = (1,2,4,6,1,2,4,6)
x_plus_y = x_tpl + y_tpl # x_plus_y = (1, 2, 4, 6, 1, 3, 6, 9)
print(min(x_tpl),max(x_tlp),sum(x_tpl)) # 最大、最小、求和
2.3 Set
x_set = {1,3,5,6,9,100,111,323} # {}
y_set = set([3,7,8,44,68])
### 增加元素 ###
x_set.add(100)
### 删除元素 ###
x_set.remove(1) # 无返回值,remove方法删除元素时,如果元素不存在,会引发KeyError的错误
x_set.discard(999) # 无返回值discard方法删除元素时,元素不存在,不会引发任何错误
popedX = x_set.pop() # 从set中随机删除并返回一个元素,set为空会引发KeyError
x_set.clear() # 无返回值,移除set中的所有元素
### 运算 ###
x_set.update(y_set) # 并集: x_set 会变,没有返回值
x_union_y = x_set.union(y_set) # 并集: x_set 会变,有返回值。take the union of x_set and y_set in mathematical sense.
x_minus_y = x_set - y_set # 删除 x 包含的 y 中出现的的元素;x_minus_y = {2, 4, 8, 10}
x_inter_y = x_set.intersection(y_set) #交集 take the intersection between those two sets; x_inter_y = {1,6,12}
2.4 Dict
2.5 Numpy.array
2.5.1 Creating an array
import numpy as np
### Creating an array given a list
x_array = np.array([1,2,4,6]) # Type: <class 'numpy.ndarray'>
y_array = np.array([1,3,6,9])
### 除了 list 之外,还可以通过 set、tuple 和 series 创建array ###
set_array = np.array({1,2,4,6}) # Type: <class 'numpy.ndarray'>
tpl_array = np.array((1,2,4,6)) # Type: <class 'numpy.ndarray'>
set_array * 2 # ⇒ TypeError: unsupported operand type(s) for *: 'set' and 'int'
tpl_array * 2 # ⇒ array([2, 4, 8, 12])
2.5.2 增减元素与运算
### 加减乘除运算 ###
x_array / 2 # ⇒ array([0.5, 1. , 2. , 3. ])
x_array * 2 # ⇒ array([2, 4, 8,12])
x_array + y_array # ⇒ array([ 2, 5, 10, 15])
x_array - y_array # ⇒ array([ 0, -1, -2, -3])
x_array * y_array # 对应元素相乘1 ⇒ array([ 1, 6, 24, 54])
np.multiply(x_array,y_array) # 对应元素相乘2 ⇒ array([ 1, 6, 24, 54])
x_array / y_array # ⇒ array([1. , 0.66666667, 0.66666667, 0.66666667])
np.dot(x_array,y_array) #点乘/内积 ⇒ 1*1 + 2*3 + 4*6 + 6*9 = 85
2.6 Pandas.Series
2.6.1 创建Series
import pandas as pd
#Creating a Series with Automatic Index
print("1. CREATING a Series with Automatic Index\n")
d = {'one':1,'two':2,'three':3}
s = pd.Series(d) # Creating a Series from a dict (with Automatic Index)
print(s['one'],s.two) #Indexing a Series with []; use the label like an attribute
# index the series
print('\n2. INDEX the series\n')
dates = pd.date_range('1/1/2020',periods = 5)
z = np.random.normal(size=5)
time_series = pd.Series(z,index = dates) # when index is date, Series is time series
print(time_series)
# creating a Dict of Series
print('\n3. CREATING a Dict of Series\n')
series_dict = {
'x' :
pd.Series([1., 2., 3.], index=['a', 'b', 'd']),
'y' :
pd.Series([4., 5., 6., 7.], index=['a', 'b', 'c', 'd']),
'z' :
pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])
}
print(series_dict,type(series_dict))
df = pd.DataFrame(series_dict) # Converting the Dict to a DataFrame
print(df,type(df))
df.plot() # plot a dateframe (after import)
plt.show()
2.6.2 Series 向量化运算
- numpy vectorisation(向量化) works for series objects too.# 可加减乘除进行运算
- Functions such as cumsum and cumprod are implemented for Series as methods.
#Arithmetic and Vectorised Functions 向量化
## features of pandas - 2. viatorisation is implemented -
d = {'one':1,'two':2,'three':3}
s = pd.Series(d)
s_sqrt = s ** 2 #work on every value in Series
s2 = pd.Series({'two':2,'three':3, 'four':4})
s3 = s + s_sqrt
s4 = s + s2 # NaN means not a number
print('s is: \n',s,'\n\ns_sqrt is: \n',s_sqrt,'\n\ns3 is: \n',s3,'\n\ns4 is: \n',s4)
2.7 Pandas.df
2.7.1 df 的创建与行列名设置
import pandas as pd
df = pd.DataFrame([[1.0, 4.0, 0.1],[2.0,5.0,0.2],[None,6.0,0.3],[3.0,7.0,0.4]],index = ['a','b','c','d'])
df.columns = ['x','y','zzz']
dfplus.rename(columns={'zzz': 'z'},inplace=True) # 更改列变量名
for DataFrame df : x y z a 1.0 4.0 0.1 b 2.0 5.0 0.2 c N a N 6.0 0.3 d 3.0 7.0 0.4 \begin{matrix} & x & y & z\\a & 1.0 & 4.0 & 0.1\\b & 2.0 & 5.0 & 0.2\\c & NaN & 6.0 & 0.3\\d & 3.0 & 7.0 & 0.4\end{matrix} abcdx1.02.0NaN3.0y4.05.06.07.0z0.10.20.30.4
2.7.2 索引使用
### Usage 1 行列名定位
df[x] is equal to df.x
df[x][b] = df.x.b = df['x'].b, the result is: 2.0
df['x'][['b','c']] = df.x[['b','c']])
df.x.['b','c'] # error : ' \[ ' invalid syntax
Usage 2 slice
### Usage 2 slice ###
df['x'][['b','c']] 先列后行
slice using loc 先行后列
df.loc[['b','d']][['x','y']] is equal to df.loc[['b','d'],['x','y']]
df.loc['b':'d',['x','y']]
Syntax: df.loc[row_indexer,column_indexer]先行后列
df.iloc[1:,:2]
## Usage 3 Logical Indexing ###
use logical indexing to retrieve a subset of the data
df[df['x'] >= 2]
## 查找最大值所在行列索引
df.idxmax() #默认为0,返回一列最大值所在行的行索引
df.idxmax(1) #设置为1,则为一行最大值所在列的列索引
2.7.2 其他df语法
df.describe() # 输出计数、mean、std、min、max和三个四分位数等信息
df.describe()['x']['mean'] #指定变量后按行索引搜索统计值
df.index #行指标名
df.columns #列变量名
- 关于 Flowchart流程图 语法说明:https://mermaid-js.github.io/mermaid/#/