Pandas模块（学习笔记）

最新推荐文章于 2023-11-14 14:00:00 发布

纳米一点点

最新推荐文章于 2023-11-14 14:00:00 发布

阅读量1.2k

点赞数

文章标签： python 数据挖掘数据分析

本文链接：https://blog.csdn.net/weixin_51945232/article/details/121775412

版权

Pyrhon数据分析基础：Pandas模块

安装

在命令提示符中，选择以管理员身份运行
在这里插入图片描述

使用

pip install pandas

下载软件包在这里插入图片描述
再使用

pip list

查看有没有此软件包信息
在这里插入图片描述
出现表示已经成功

定义

Pandas是Python的一个数据分析包

Pandas数据结构

维度

高维数据结构是低维数据结构的容器

数据结构	维数	描述
Series	1	一维数组，大小不可变，由同种数据类型元素组成。
DataFrame	2	二维数组，大小可变的表格结构，它含有一组有序的列，每列可以是不同的数据类型（整型、字符串、布尔值等）
Panel	3	大小可变的三维数组

可变性

数据结构	数据	大小
Series	数据可变	大小不可变
DataFrame	数据可变	大小可变
Panel	数据可变	大小可变

Series对象

定义

Pandas序列(Series)是pandas中的一维数据结构，类似于Python中的列表和Numpy中的Ndarray对象。

导入我们需要的软件包

import pandas as pd
import numpy as np

Series结构

pandas.Series(data,index,dtype,copy)  
data:数据，可以是序列类型，可以是int index:索引值必须是唯一的，与data的长度相同，默认np.arange(n)  
dtype:数据类型 
copy:是否复制数据，默认为false 
打印左侧为索引，右侧为数据

代码块

#创建数组
a = np.array([1,2,3,4,5])
s1 = pd.Series(a,index=['a','b','c','d','e'],copy=False)
#修改数据，根据下标
s1[0]=100
print(s1)
'''
a    100     
b      2     
c      3     
d      4     
e      5     
dtype: int32
'''
print(a)
#[100   2   3   4   5]



np01 = np.array([1,2,3])
s2 = pd.Series(np01)
print(s2)
'''
0    1      
1    2      
2    3      
dtype: int32
'''

字典 = {k,v} k:在series中是index

d1 = {'zs':18,'ls':20}
s3 = pd.Series(d1)
print(s3)
'''
zs    18    
ls    20    
dtype: int64
'''
print(s3['zs']) #18
#第二种
a = [18,20]
s4 = pd.Series(a,index=['zs','ls'])
print(s4)
'''
zs    18    
ls    20    
dtype: int64
'''

获取数据

语法：s1[index] 获取单个数据

#可以使用默认索引，也可以使用自定义索引
d1 = {'zs':20,'ls':30,'ww':40,'ch':20,'zl':30}
s1 = pd.Series(d1)
print(s1['zs']) #使用自定义索引 20
print(s1[0]) #使用默认索引 20


'''
语法：
s1[list] 获取索引中的数据
s1[index1:index2] 获取从第一个索引到第二个索引的数据，左开右闭
数字索引左开右闭，标签索引是左开右开
'''
print(s1[0:3]) #左开右闭
'''
zs    20    
ls    30    
ww    40    
dtype: int64 
'''
print(s1[[0,3,4]])
'''    
zs    20    
ch    20    
zl    30    
dtype: int64
'''
print(s1[['zs','ch','ww']])
'''
zs    20
ch    20
ww    40
dtype: int64
'''

常用属性和方法

属性和方法	说明
axes	返回Series索引列表
dtype	返回Series数据类型
empty	返回Series是否为空，如果为空，则返回true
ndim	返回基础数据的维度数，默认为1
size	返回基础数据中的元素个数
values	将Series作为ndarray返回
head()	返回前n行
tail()	返回最后n行

head()  返回前n行(观察索引值)，默认数量为5，可以传递自定义数值
tail()  返回最后n行(观察索引值)，默认数量为5，可以传递自定义数值

代码块

d1 = {'zs':20,'ls':30,'ww':40,'ch':20,'zl':30}
s1 = pd.Series(d1)
print(s1.axes) #[Index(['zs', 'ls', 'ww', 'ch', 'zl'], dtype='object')]
print(s1.dtype) #int64
print(s1.empty) #False
print(s1.ndim) #1
print(s1.size) #5
print(s1.values)#[20 30 40 20 30]  
print(type(s1.values))#<class 'numpy.ndarray'>
print(s1.head(3))#返回前3行
'''
zs    20
ls    30
ww    40
dtype: int64
'''
print(s1.tail(2))#返回后2行
'''
ch    20
zl    30
dtype: int64
'''
#------------------------------------------
d1 = pd.Series([1,'a',0.5,['张三','李四']],index=['a','b','c','d'])
s1 = pd.Series(d1)
print(s1.axes) #[Index(['a', 'b', 'c', 'd'], dtype='object')]
print(s1.dtype) #object
print(s1.empty) #False
print(s1.ndim) #1
print(s1.size) #4
print(s1.values)#[1 'a' 0.5 list(['张三', '李四'])] 
print(type(s1.values))#<class 'numpy.ndarray'>
print(s1.head(3))
'''
a      1
b      a
c    0.5
dtype: object
'''
print(s1.tail(2))
'''
c         0.5
d    [张三, 李四]
dtype: object
'''

DataFrame对象

定义
Pandas数据帧(DataFrame)是二维数据结构，它包含一组有序的列，每列可以是不同的数据类型，可看作是Series组成的字典

常用方法属性

参数	说明
data	支持多种数据类型，如：ndarray,series,map,lists,dict,constant和另一个DataFrame.
index	行标签，如果没有传递索引值，默认值为0，1，2，3，4…
colums	列标签，如果没有传递索引值，默认值为0，1，2，3，4…
dtype	每列的数据类型
copy	是否复制数据，默认值为false

语法

pandas.DataFrame(data,index,colums,dtype,copy)

代码块

np01 = np.arange(20).reshape(4,5)
df1 = pd.DataFrame(np01,index=['a','b','c','d'],columns=['name','age','sex','hobbit','address'],dtype=np.float32)
print(df1)
'''
   name   age   sex  hobbit  address
a   0.0   1.0   2.0     3.0      4.0
b   5.0   6.0   7.0     8.0      9.0
c  10.0  11.0  12.0    13.0     14.0
d  15.0  16.0  17.0    18.0     19.0
'''

#利用单程list,数据变成行数据

x = [1,2,3,4,5]
df1 = pd.DataFrame(x)
print(df1)
'''
   0
0  1
1  2
2  3
3  4
4  5
'''

#利用双层list

x = [
    ['zs',20],
    ['ls',30],
    ['ch',40]
]
df2 = pd.DataFrame(x,columns=['name','age'])
print(df2)
'''
  name  age
0   zs   20
1   ls   30
2   ch   40
'''

#利用数据是字典的列表创建

1.k变成列标签
2.没有数据的用NaN

x = [
    {'a':1,'b':2},
    {'a':10,'b':20,'c':30}
]
#index 行索引 columns列索引
df3 = pd.DataFrame(x)
print(df3)
'''
    a   b     c
0   1   2   NaN
1  10  20  30.0

#利用字典创建DataFrame

#a、b是列索引，一定要设置行索引(index),否则报错

#普通字典
x = {'馒头':1,'包子':1.5,'豆浆':2,'倔强面':17}
df1 = pd.DataFrame(x,index=['价格'])
print(df1)
'''
   馒头   包子  豆浆  倔强面
价格   1  1.5   2   17 
'''
#字典里面放list
x = {
    '食品':['馒头','大米','包子'],
    '价格':[2,1,5]
}
df2 = pd.DataFrame(x)
print(df2)
'''
  食品  价格
0  馒头   2  
1  大米   1  
2  包子   5  
'''
#字典里面放Series
s1 = pd.Series(['馒头','烧饼','面条'])
s2 = pd.Series([1,2,1.5])
x = {
    '食品':s1,
    '价格':s2
}
df3 = pd.DataFrame(x)
print(df3)
'''
   食品   价格
0  馒头  1.0  
1  烧饼  2.0  
2  面条  1.5 
'''

查询

语法

df[columns_label]   查询单列
df[[columns_label]] 查询多列
查询出一列会进行降维DataFrame---->Series

代码块

s1 = pd.Series(['馒头','烧饼','面条'])
s2 = pd.Series([1,2,1.5])
x = {
    '食品':s1,
    '价格':s2
}
df3 = pd.DataFrame(x)
#print(df3)


s = df3['食品']
print(s)
print(type(s))

s3 = df3[['食品','价格']]
print(s3)
print(type(s3))

添加列

语法：

df[columns]=数据 切记数据要是同类型的数据， 	
df1[columns] = df1[columns]+df1[columns]

代码块

s1 = pd.Series(['馒头','烧饼','面条'])
s2 = pd.Series([1,2,1.5])
x = {
    '食品':s1,
    '价格':s2
}
df3 = pd.DataFrame(x)
a = ['好吃','多','馅多']
df3['评价'] = a
print(df3)
print("－－－－－－－")
df3['详细评价'] = df3['食品'] + df3['评价']
print(df3)
'''
   食品   价格  评价
0  馒头  1.0  好吃  
1  烧饼  2.0   多   
2  面条  1.5  馅多  
－－－－－－－
   食品   价格  评价  详细评价
0  馒头  1.0  好吃  馒头好吃
1  烧饼  2.0   多   烧饼多
2  面条  1.5  馅多  面条馅多
'''

删除

语法

del df[columns] 根据下标进行检索删除，没有返回值
df.pop(columns) 根据下标进行检索删除，并返回删除的那一列

代码块

s1 = pd.Series(['馒头','烧饼','面条'])
s2 = pd.Series([1,2,1.5])
x = {
    '食品':s1,
    '价格':s2
}
df3 = pd.DataFrame(x)
del df3['评价']
print(df3)
'''
   食品   价格  详细评价
0  馒头  1.0  馒头好吃
1  烧饼  2.0   烧饼多
2  面条  1.5  面条馅多
'''
a = df3.pop('详细评价')
print(a)#要删除的那一列数据
'''
0    馒头好吃
1     烧饼多
2    面条馅多
'''
print(df3)#删除后的数据
'''
  食品   价格
0  馒头  1.0
1  烧饼  2.0
2  面条  1.5
'''

pandas对象的索引

语法

loc[index] 查询一行数据
loc[自定义索引(标签索引)]行
iloc[默认索引(数字索引)]

扩展

loc[index,columns] 精确到行和列，那就是一个数据
查询多行和多列（精确到第几行和第几列）
loc[[index1,index2,....],[columns1,columns12,.....]] 
查询多行和多列（第几行到第几行,第几列到第几列）不是左开右闭
loc[index1:index2,columns1,columns2] 
使用布尔索引也可以查询多行多列
loc[行_布尔索引,列_布尔索引]

代码块

s1=pd.Series(['馒头','大米','包子','大盘鸡','麻辣烫','鱼粉','热干面'],index=['row1','row2','row3','row4','row5','row6','row7'])
s2=pd.Series([1,2,1.5,12,16,10,8],index=['row1','row2','row3','row4','row5','row6','row7'])
s3=pd.Series(['Y','Y','Y','Y','Y','Y','Y'],index=['row1','row2','row3','row4','row5','row6','row7'])
x={
    '食品':s1,
    '价格':s2,
    '评价':s3 
}
df3=pd.DataFrame(x,index=['row1','row2','row3','row4','row5','row6','row7'])
print(df3)
'''
      食品    价格 评价
row1   馒头   1.0  Y    
row2   大米   2.0  Y    
row3   包子   1.5  Y    
row4  大盘鸡  12.0  Y   
row5  麻辣烫  16.0  Y   
row6   鱼粉  10.0  Y    
row7  热干面   8.0  Y 
'''
#用法
#print(df3.loc['row1'])
#print(df3.loc['row4','食品'])
#print(df3.loc[['row1','row3'],['食品','价格']])
#print(df3.loc[['row1','row3'],'食品'])
#print(df3.loc['row1',['食品','价格']])
#print(df3.loc['row1','row4','食品':'价格'])
#print(df3.loc[['row1','row3'],'食品':'价格'])
#index_bool = [True,False,False,False,False,True,True]
#col_bool = [True,True,False]
#print(df3.loc[index_bool,col_bool])
'''
      食品    价格
row1   馒头   1.0  
row6   鱼粉  10.0  
row7  热干面   8.0
'''
b = df3['价格'] > 8
print(df3.loc[b])#行
print(b)
#－－－－－－－－－－－－－－－－－－－－－－－－－－－
#查询行
'''
查询行的，iloc[数字索引]
语法：
iloc[num_index] 根据索引位置获取行
iloc[num_index1:num_index2] 第几行到第几行,左开右闭
iloc[[num_index1,num_index2,.....]] 第几行和第几行
iloc[num_index,num_columns] #第几行的第几列
iloc[num_index,[num_columns1,num_columns2,....]] 第几行，第几列和第几列
iloc[num_index,[num_columns1:num_columns2]] 第几行，第几列到第几列,左开右闭
iloc[[num_index1,num_index2,.....],[num_columns1,num_columns2,....]] 
iloc[num_index1:num_index2,[num_columns1:num_columns2]] 
'''
#print(df3.iloc[0])
#print(df3.iloc[0:4])
#print(df3.iloc[[0,3]])
print(df3.iloc[6,0])#热干面
print(df3.iloc[6,0:2])#
'''
食品    热干面
价格    8.0
'''
print(df3.iloc[6,[0,2]])
'''
食品    热干面
评价      Y
'''
print(df3.iloc[[0,2,4],[0,2]])
'''
       食品 评价
row1   馒头  Y
row3   包子  Y
row5  麻辣烫  Y
'''
df3[0:4]#第一行到第五行，但是第五行取不到
df3['row1',:'row4'] #第一行到第四行，第四行能取到

append()追加

在末尾追加一行，返回一个新对象

df.append(other,ignore_index=False,verify_integrity = False,sort = False)
other：要附加的数据，DataFrame或者Series等类型
ignore_index=False：如果是true,则不使用索引标签，默认false
verify_integrity = False：如果是true，在创建于重复项的索引时，引发valueError,默认时false
sort = False： 如果原数据和添加数据的列没有对齐，则对列进行排序，不建议排序

代码块

s1 = pd.Series(['zs','ls','ww'],index=['row1','row2','row3'])
s2 = pd.Series([10,20,30],index=['row1','row2','row3'])
x = {
    'name':s1,
    'age':s2
}
df1 = pd.DataFrame(x)

s3 = pd.Series(['大黄','狗蛋','铁蛋','富贵'],index=['row1','row2','row3','row4'])
s4 = pd.Series([10,10,10,10],index=['row1','row2','row3','row4'])
x1 = {
    'name':s3,
    'age':s4
}
df2 = pd.DataFrame(x1)

df3 = df1.append(df2,verify_integrity=False)#不等于false报错
print(df3)
''''
     name  age  
row1   zs   10  
row2   ls   20  
row3   ww   30  
row1   大黄   10
row2   狗蛋   10
row3   铁蛋   10
row4   富贵   10
'''
print("---------append(ignore_index)---------")
df3 = df1.append(df2,ignore_index=True)#将标签改变为数字，不出现row，索引
#print(df3)
'''
 name  age
0   zs   10
1   ls   20
2   ww   30
3   大黄   10
4   狗蛋   10
5   铁蛋   10
6   富贵   10
'''

删除行

df1 = df.drop(index) #删除某行，返回一个新数据
index只能是索引标签（自定义索引）
'''
df3 = df2.drop('row4')
print(df3)
'''
    name  age
row1   大黄   10
row2   狗蛋   10
row3   铁蛋   10
'''

纳米一点点

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas模块（学习笔记）

Pyrhon数据分析基础：Pandas模块安装在命令提示符中，选择以管理员身份运行使用pip install pandas下载软件包再使用pip list查看有没有此软件包信息出现表示已经成功定义Pandas是Python的一个数据分析包import pandas as pdimport numpy as npfrom pandas.core.frame import DataFrame‘’’pandas.Series(data,index,dtype,copy)
复制链接

扫一扫