本文主要参考Pandas中文文档进行学习讲解:
1.pandas可以直接使用列表和numpy数组:pd.Series()
import pandas as pd
import numpy as np
list= [11, 23, 35, 35, 64, 58]
s = pd.Series(list)
print(s)
num = np.array([12.3,34,56,88])
ss = pd.Series(num)
print(ss)
2.生成日期索引 pd.date_range('20220718', periods=10)
import pandas as pd
dates = pd.date_range('20220718', periods=10)
print(dates)
3.用含日期时间索引与标签的 NumPy 数组生成 DataFrame:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=10)
df = pd.DataFrame(np.random.randn(10, 6), index=dates, columns=list('ABCDEF'))
print(df)
4.字典对象生成 DataFrame:
import pandas as pd
import numpy as np
zd = {'A': 1.,
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo'}
df2 = pd.DataFrame(zd)
print(df2.dtypes)
print(df2)
5.如何查看 DataFrame 头部和尾部数据:df.head() df.tail()
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=10)
df = pd.DataFrame(np.random.randn(10, 5), index=dates, columns=list('ABCDE'))
#查看前两行
print(df.head(2))
#查看后4行
print(df.tail(4))
6.显示索引:df.index
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=10)
df = pd.DataFrame(np.random.randn(10, 5), index=dates, columns=list('ABCDE'))
print(df)
print('\n索引为:',df.index)
7.显示列名:df.columns
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=10)
df = pd.DataFrame(np.random.randn(10, 5), index=dates, columns=list('ABCDE'))
print(df)
print('\n列名为:',df.columns)
8转化为numpy:df.to_numpy()
DataFrame.to_numpy() 的输出不包含行索引和列标签。
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df.to_numpy())
9.数据统计摘要df.describe()
describe() 可以快速查看数据的统计摘要:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.describe())
10.转置数据:df.T
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.T)
11.按轴排序:df.sort_index(axis=1,ascending=False)
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.sort_index(axis=1,ascending=False))
12.选择具体数据:
选择单列,产生 Seriesdf['A'] 与 df.A 等效
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df['B'])
print(df.A)
用 [ ] 切片行:df['20220718':'20220721']
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df[0:3])
print(df['20220718':'20220721'])
用标签提取一行数据:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.loc[dates[0]])
print(df.loc['20220718'])
用标签切片,包含行与列结束点:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.loc['20220718':'20220721', ['A', 'C']])
提取标量值(某一个值):
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.loc[dates[0], 'A'])
print(df.at[dates[0], 'A'])
用整数位置选择:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.iloc[3])
print(df.iloc[1:3,2:3])
print(df.iloc[[1,3],[2,3]])
print(df.iloc[2,3])
快速访问标量:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df.iloc[2,3])
print(df.iat[2,3])
布尔索引
用单列的值选择数据:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print(df[df.A > 0])#A这一列的数值大于0的显示
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(df)
print( df[df > 0])#数值大于0的显示
用 isin() 筛选:
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df['E'] = [1,2,3,4,5,6]
print(df)
print(df[df['E'].isin([1, 3])])
13.赋值
import pandas as pd
import numpy as np
dates = pd.date_range('20220718', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df['E'] = [1,2,3,4,5,6]
print(df)
#按标签赋值:
df.at[dates[0], 'A'] = 0.666
#按位置赋值:
df.iat[0, 2] = 3
#按 NumPy 数组赋值:
df.loc[:, 'E'] = np.array([5] * len(df))
print(df)