pandas操作记录(一）

最新推荐文章于 2024-09-13 08:50:19 发布

水野与小太郎

最新推荐文章于 2024-09-13 08:50:19 发布

阅读量132

点赞数

本文链接：https://blog.csdn.net/qq_36336522/article/details/103233847

版权

一、数据结构

1、series：一维数据，同构（相同的结构、数据类型），带标签
2、DataFrame：二维数据，异构，大小可变
DataFrame 可以看作是 series的容器

二、简单操作：创 + 增删改查

s = pd.Series(data, index=index)
# exmaple 1
s1 = pd.Series([0., 1., 2.], index=['a', 'b', 'c'])
a    0
b    1
c    2
# example 2
s2_d = {'b': 1, 'a': 0, 'c': 2}
s2 = pd.Series(d)
b    1
a    0
c    2

# 用Series字典生成dataframe
d = {
        'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
        'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])
}
df = pd.DataFrame(d)
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

# 用字典列表生成dataframe
data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df2 = pd.DataFrame(data2, index = ["row1", "row2"])
   a   b     c
row1  1   2   NaN
row2  5  10  20.0

# 删 df2
      a   b     c   d
row1  1   2   NaN   2
row2  5  10  20.0  50
del df2['c']
      a   b   d
row1  1   2   2
row2  5  10  50
df2.pop('d')
      a   b
row1  1   2
row2  5  10

# 增 列
df2['c']=[3., 15.]
      a   b     c
row1  1   2   3.0
row2  5  10  15.0
df2.insert(1, 'E', [1., 2.])

# 查
操作	         句法	         结果
选择列	         df[num]	 Series
用标签选择行	 df.loc[label]	 Series
用整数位置选择行	 df.iloc[num]	 Series
行切片	         df[5:10]	 DataFrame
查看顶部的数据行   df.head(num)
查看底部的数据行   df.tail(num)

二、文件 IO

Format Type	Data Description	Reader	Writer
text	CSV	read_csv	to_csv
text	JSON	read_json	to_json
text	HTML	read_html	to_html
text	Local clipboard	read_clipboard	to_clipboard
binary	MS Excel	read_excel	to_excel
binary	OpenDocument	read_excel
binary	HDF5 Format	read_hdf	to_hdf
binary	Feather Format	read_feather	to_feather
binary	Parquet Format	read_parquet	to_parquet
binary	Msgpack	read_msgpack	to_msgpack
binary	Stata	read_stata	to_stata
binary	SAS	read_sas
binary	Python Pickle Format	read_pickle	to_pickle
SQL	SQL	read_sql	to_sql
SQL	Google Big Query	read_gbq	to_gbq

df = pd.read_csv('data.csv')  
df.to_csv('data_out.csv')

三、统计信息或数据

   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

df.describe() # 显示统计摘要
         a    b    c
count  3.0  3.0  3.0
mean   4.0  5.0  6.0
std    3.0  3.0  3.0
min    1.0  2.0  3.0
25%    2.5  3.5  4.5
50%    4.0  5.0  6.0
75%    5.5  6.5  7.5
max    7.0  8.0  9.0
df.mean()：返回所有列的均值
df.corr()：返回列与列之间的相关系数
df.count()：返回每一列中的非空值的个数
df.max()：返回每一列的最大值
df.min()：返回每一列的最小值
df.median()：返回每一列的中位数
df.std()：返回每一列的标准差