python|jupyter|padas|dataframe|4.2Dataframe基本操作

最新推荐文章于 2024-03-20 13:46:54 发布

牛奶与喵

最新推荐文章于 2024-03-20 13:46:54 发布

阅读量1.3k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/qq_43691842/article/details/102324630

版权

python 专栏收录该内容

18 篇文章

订阅专栏

4.2 Dataframe基本操作

介绍

dataframe最常用的padas对象，类似于excel表格。完成数据读取后，数据以Dataframe结构存储

1.查看Dataframe常用属性

values 元素
index 索引
columns 列名
dtypes 类型
size：元素个数
ndim：维度数
shape：数据形状（行、列数目）
T：转置（行列转换）

#导入
from sqlalchemy import create_engine
import pandas as pd
#连接数据库
engine=create_engine('mysql+pymysql://root:1234@127.0.0.1:3306/testdb?charset=utf8')
#查看表索引、所有值、列名、数据类型
detail=pd.read_sql_table('student',con=engine)
print('学生表索引为：',detail.index)
print('学生表所有值为：',detail.values)
print('学生表列名为：',detail.columns)
print('学生表元素个数：',detail.size)
print('学生表维度数：',detail.ndim)
print('学生表形状：',detail.shape)
print('学生表转置后形状',detail.T.shape)

在这里插入图片描述

2.查dataframe数据

<1>单列数据（字典访问、访问属性）

#字典访问单列数据
sname=detail['sid']
print('学生表中姓名的形状为：',sname_shape)

#访问属性访问单列数据
sname=detail.sname
print('学生表中姓名的形状为：',sname_shape)

在这里插入图片描述

<2>单列多行/多行

#单列多行
sname2=detail['sname'][:2]
print('前2个元素为',sname2)
#访问多行=访问所有列-->用“：”代替即可
sname3=detail[:][1:6]
print('访问1-6行元素',sname2)

在这里插入图片描述
<3>多行(head\tail)

#head tail是默认参数，访问前/后5行数据。
#也可在（）中输入行数查看目标行数
print('前5行',detail.head())
print('后5行',detail.tail())

<4>使用loc和iloc实现单列/多列/花式/条件切片

#loc 单列
sname4=detail.loc[:,'sname']
print('使用loc提取sname列的size为',sname4.size)
#iloc 单列
sname5=detail.iloc[:,3]
print('使用iloc提取第3列的size为',sname5.size)
#loc 多列
student1=detail.loc[:,['sid','sname']]
print('使用loc提取sid列和sname列的size为',student1.size)
#iloc 多列
student2=detail.iloc[:,[1,3]]
print('使用iloc提取第1列和第3列的size为',student2.size)
#loc 花式
print('第3行，第sid,sname列数据：',detail.loc[3,['sid','sname']])
print('第2,3行，第sid,sname列数据：',detail.loc[2:3,['sid','sname']])
#iloc 花式
print('第3行，第1,2列数据：',detail.loc[3,[1,2]])
print('第2,3,4行，第1,2列数据：',detail.loc[2:4,[1,2]])
#loc 条件切片
print('sid=1的sname',detail.loc[detail['sid']=='1',['sid','sname']])
#iloc 条件切片
print('sid=1的第2,3列',detail.iloc[detail['sid']=='1',[2,3]])#报错
print('sid=1的第2,3列',detail.iloc[(detail['sid']=='1').values,[2,3]])#正确

<5>ix切片

#ix 切片
dataframe.ix[行索引的名称或位置或条件,列索引的名称或位置]
print（'第2-6行，第5列数据',detail.ix[2:6,5]）

3.修改dataframe数据

#将sname=n的变为nn
detail.loc[detail[sname]=='n',sname]='nn'
print('更改后的sname为:',detail.loc[detail[sname]=='n',sname])

4.增加dataframe数据

#增加班级列
detail['class']='7'#定值
detail['class']=detail['counts']*detail['amouts']#非定值

5.删除dataframe数据

pandas.drop
dataframe.drop(labels,axis=0,level=None,inplace=False,errors='raise')
#删除某列
print('删除class前',detail.columns)
detail.drop(labels='class',axis=1,inplace=True)
print('删除class后',detail.columns)
#删除某行
detail.drop(labels='nn',axis=0,inplace=True)//axis=0,label=行参数
#删除几行
print('删除前长度为',len(detail))
detail.drop(labels=range(2,4),axis=0,inplace=True)//删除第2-4行
print('删除后长度为',len(detail))

在这里插入图片描述

5.描述分析dataframe数据

在这里插入图片描述
pandas库基于numpy

import numpy as np

#平均值-np.mean
print('价格平均值',np.mean.(detail['amounts']))

#平均值-pandas
#print('价格平均值',detail['amount'].mean())

#数值型特征的描述性统计-describe
print('counts和amounts的描述性统计\n',
detail[['counts','amounts']].describe())

#频数统计values_counts
print('dishes_name频数统计前10为：\n',
detail['dishes_name'].values_counts()[0:10])

#将object转换category类型astype
detail['dishes_name']=detail['dishes_name'].astype('category')

#category类型特征的描述性统计
print('描述统计结果为',detail['dishes_name'].describe())