Pandas常用操

最新推荐文章于 2021-09-01 00:47:07 发布

不知道叫啥的喵

最新推荐文章于 2021-09-01 00:47:07 发布

阅读量524

点赞数 1

分类专栏：机器学习、深度学习 # 基础内容和一些优化算法

本文链接：https://blog.csdn.net/weixin_43570155/article/details/118771513

版权

机器学习、深度学习同时被 2 个专栏收录

21 篇文章 2 订阅

订阅专栏

基础内容和一些优化算法

3 篇文章 0 订阅

订阅专栏

读取本地EXCEL数据

import pandas as pd
df = pd.read_excel('/home/kesci/input/pandas1206855/pandas120.xlsx')

保存成CSV文件

df.to_csv('normal.csv',index=False,sep=',')

将字典创建为DataFrame

data = {"grammer":["Python","C","Java","GO",np.nan,"SQL","PHP","Python"],
       "score":[1,2,np.nan,4,5,6,7,10]}
df = pd.DataFrame(data)
df

out: 在这里插入图片描述

提取含有字符串"Python"的行

#方法一
df[df['grammer'] == 'Python']
#方法二
results = df['grammer'].str.contains("Python")
results.fillna(value=False,inplace = True)
df[results]

out: 在这里插入图片描述

提取popularity列满足某条件的行

提取popularity列中值大于3的行

df[df['popularity'] > 3]

提取popularity列值大于3小于7的行

df[(df['popularity'] > 3) & (df['popularity'] < 7)]

删除指定行/指定列

删除指定行

new_df = df.drop(index=‘行索引’)
new_df = df.drop(‘行索引’, axis=‘index’)
new_df = df.drop(‘行索引’, axis=0)

删除指定的多行

new_df = df.drop(index=[‘行索引1’, ‘行索引2’])
new_df = df.drop([‘行索引1’,‘行索引2’], axis=‘index’)
new_df = df.drop([‘行索引1’, ‘行索引2’], axis=0)

删除指定列

new_df = df.drop(columns=‘列名’)
new_df = df.drop(‘列名’, axis=‘columns’)
new_df = df.drop(‘列名’, axis=1)

删除指定的多列

new_df = df.drop(columns=[‘列名1’, ‘列名2’])
new_df = df.drop([‘列名1’, ‘列名2’], axis=‘columns’)
new_df = df.drop([‘列名1’, ‘列名2’], axis=1)

drop 常用参数含义

inplace: 是否修改原Dataframe。

False: 返回新的Dataframe（默认）
True: 直接修改原Dataframe，返回None

axis: 轴，是否从索引或列中删除标签。 (与sum，mean等计算函数中的axis的含义不同)

0 或 index: 方向为行，默认值0
1 或 columns: 方向为列

删除最后一行数据

df.drop([len(df)-1],inplace=True)
df

添加一行数据[‘Perl’,6.6]

row={'grammer':'Perl','popularity':6.6}
df = df.append(row,ignore_index=True)
df

输出df的所有列名

print(df.columns)

out : Index([‘grammer’, ‘score’], dtype=‘object’)

修改第二列列名为’popularity’

df.rename(columns={'score':'popularity'}, inplace = True)
df

提取popularity列最大值所在行

df[df['popularity'] == df['popularity'].max()]

交换两列位置

#方法1
temp = df['popularity']
df.drop(labels=['popularity'], axis=1,inplace = True)
df.insert(0, 'popularity', temp)
df

#方法2
#cols = df.columns[[1,0]]
#df = df[cols]
#df

查看最前/后5行数据

df.head()
df.tail()

统计grammer列中每种编程语言出现的次数

df['grammer'].value_counts()

将空值用上下值的平均值填充

df['popularity'] = df['popularity'].fillna(df['popularity'].interpolate())
df

按照grammer列进行去除重复值

df.drop_duplicates(['grammer'])

计算popularity列平均值

df['popularity'].mean()

out : 4.75

对数据按照"popularity"列值的大小进行排序

df.sort_values("popularity",inplace=True)
df

统计grammer列每个字符串的长度

df['grammer'] = df['grammer'].fillna('R')
df['len_str'] = df['grammer'].map(lambda x: len(x))
df

将grammer列转换为list

df['grammer'].to_list()

out : [‘Python’, ‘C’, ‘Java’, ‘GO’, nan, ‘SQL’, ‘PHP’, ‘Python’]

将DataFrame保存为CSV

df.to_csv('test.csv')

查看数据行列数

df.shape

out : (8, 2)

不知道叫啥的喵

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
4
评论
Pandas常用操

读取本地EXCEL数据import pandas as pddf = pd.read_excel('/home/kesci/input/pandas1206855/pandas120.xlsx')将字典创建为DataFramedata = {"grammer":["Python","C","Java","GO",np.nan,"SQL","PHP","Python"], "score":[1,2,np.nan,4,5,6,7,10]}df = pd.DataFrame(data)
复制链接

扫一扫