pandas用法

最新推荐文章于 2022-11-15 18:51:05 发布

Sandy_Sandy_yuan

最新推荐文章于 2022-11-15 18:51:05 发布

阅读量91

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/qq_31258627/article/details/99547883

版权

python 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

import pandas as pd
pandas中的数据结构¶
series
#创建series
from pandas import Series
obj=Series([4,7,5,3])
print(obj)#类似于时间序列
0    4
1    7
2    5
3    3
dtype: int64
#index 
obj.index
RangeIndex(start=0, stop=4, step=1)
#values
obj.values
array([4, 7, 5, 3], dtype=int64)
obj2=Series([4,7,-5,3],index=['3/1','3/2','3/3','3/4'])
print(obj2)
3/1    4
3/2    7
3/3   -5
3/4    3
dtype: int64
obj2.index
Index(['3/1', '3/2', '3/3', '3/4'], dtype='object')
obj2['3/3']
-5
obj2[obj2>0]
3/1    4
3/2    7
3/4    3
dtype: int64
#in
'3/3' in obj2
True
'3/6' in obj2
False
#通过字典创建Series
dict1={'3/1':4,'3/2':7,'3/3':-5,'3/4':3}
#字典创建时把key作为索引，value作为值
print(dict1)
{'3/1': 4, '3/2': 7, '3/3': -5, '3/4': 3}
Series(dict1)
3/1    4
3/2    7
3/3   -5
3/4    3
dtype: int64
DataFrame
from pandas import DataFrame
#c创建数据库
#定义2个列表
position=['产品经理','数据分析师','UI','产品经理','开发']
print(position)
['产品经理', '数据分析师', 'UI', '产品经理', '开发']
company=['百度','360','360','阿里','58']
print(company)
['百度', '360', '360', '阿里', '58']
DataFrame([position,company])
0	1	2	3	4
0	产品经理	数据分析师	UI	产品经理	开发
1	百度	360	360	阿里	58
jobInfo=DataFrame([position,company]).T
jobInfo
0	1
0	产品经理	百度
1	数据分析师	360
2	UI	360
3	产品经理	阿里
4	开发	58
#columns
jobInfo.columns=['职位名','公司名']
jobInfo
职位名	公司名
0	产品经理	百度
1	数据分析师	360
2	UI	360
3	产品经理	阿里
4	开发	58
#index
jobInfo.index=['a','b','c','d','e']
jobInfo
职位名	公司名
a	产品经理	百度
b	数据分析师	360
c	UI	360
d	产品经理	阿里
e	开发	58
#reset index 重置索引
jobInfo.reset_index()
index	职位名	公司名
0	a	产品经理	百度
1	b	数据分析师	360
2	c	UI	360
3	d	产品经理	阿里
4	e	开发	58
jobInfo.reset_index(drop=True)
职位名	公司名
0	产品经理	百度
1	数据分析师	360
2	UI	360
3	产品经理	阿里
4	开发	58
#head
jobInfo.head(3)#显示前三行数据
职位名	公司名
0	产品经理	百度
1	数据分析师	360
2	UI	360
#tail
jonInfo.tail(2)#显示后2行数据
0	1
d	产品经理	阿里
e	开发	58
#要获取其中某列的值：第一种方法
jobInfo['职位名']
0     产品经理
1    数据分析师
2       UI
3     产品经理
4       开发
Name: 职位名, dtype: object
#要获取其中某列的值：第二种方法
jobInfo.职位名
0     产品经理
1    数据分析师
2       UI
3     产品经理
4       开发
Name: 职位名, dtype: object
#获取某一行的值：第一种方法
jobInfo.loc['c']
职位名     UI
公司名    360
Name: c, dtype: object
jobInfo.iloc[2]
职位名     UI
公司名    360
Name: c, dtype: object
一些基本功能
#创建一个数组
import numpy as np
np.arange(16)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
data=np.arange(16).reshape(4,4)
data
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])
df=DataFrame(data=data,index=['a','b','c','d'],columns=['one','two','three','four'])
df
one	two	three	four
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11
d	12	13	14	15
#丢弃制定轴上的项
df.drop('b')#删除b行
one	two	three	four
a	0	1	2	3
c	8	9	10	11
d	12	13	14	15
df.drop('one',axis=1)#axis=1表示案列执行
two	three	four
a	1	2	3
b	5	6	7
c	9	10	11
d	13	14	15
#查找列four值为7的记录
df[df['four']==7]
one	two	three	four
b	4	5	6	7
#唯一值，unique
obj=Series([1,2,2,3,3,4,5,5,5])
print(obj)
0    1
1    2
2    2
3    3
4    3
5    4
6    5
7    5
8    5
dtype: int64
obj.unique()#去重之后的数值
array([1, 2, 3, 4, 5], dtype=int64)
#频率统计，value.counts
obj.value_counts()#频率从高到低排序
5    3
3    2
2    2
4    1
1    1
dtype: int64
obj.value_counts(sort=False)#频率不排序
1    1
2    2
3    2
4    1
5    3
dtype: int64
常用数学和统计函数
df
one	two	three	four
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11
d	12	13	14	15
#describe,描述性统计分析
df.describe()
one	two	three	four
count	4.000000	4.000000	4.000000	4.000000
mean	6.000000	7.000000	8.000000	9.000000
std	5.163978	5.163978	5.163978	5.163978
min	0.000000	1.000000	2.000000	3.000000
25%	3.000000	4.000000	5.000000	6.000000
50%	6.000000	7.000000	8.000000	9.000000
75%	9.000000	10.000000	11.000000	12.000000
max	12.000000	13.000000	14.000000	15.000000
#求和
df.sum()
one      24
two      28
three    32
four     36
dtype: int64
#累计求和，cunsum
df.cumsum()
one	two	three	four
a	0	1	2	3
b	4	6	8	10
c	12	15	18	21
d	24	28	32	36
jobInfo
职位名	公司名
a	产品经理	百度
b	数据分析师	360
c	UI	360
d	产品经理	阿里
e	开发	58
#非数值型的描述型分析
jobInfo.describe()
#unique表示不重复个数，top表示频率最高的
职位名	公司名
count	5	5
unique	4	4
top	产品经理	360
freq	2	2