数据分析工具Pandas基础 --文本读写操作、排序操作

理论:

知识要点

读取数据:pd.read_csv(filepath, usecols, index_col) 

filepath: 文件路径

usecols: 指定需要读取的列(默认全部读取)

index_col: 指定某列为索引列,默认会生成一列索引 0, 1, …

df.info()

快速查看数据基本信息

保存数据:df.to_csv(filepath, index) 

filepath: 保存的路径

index: 是否将索引列保存,默认为True

 

按索引排序:sort_index()

ascending:默认为True,True 升序,False 降序

axis:默认为0,0表示按列索引,1表示按行索引

对DataFrame操作时注意轴方向

按值排序:sort_values(by, ascending)

1.按单列的值排序

axis:默认为0,0表示按列索引,1表示按行索引

by=‘label’,by的值选取,与axis的轴方向有关系,当axis=0,表示从列索引的标签中选取标签,axis=1时,表示从行索引的标签中选取标签。

ascending:默认为True,True 升序,False 降序

2.按多列的值排序,by=[  ],ascending = [ ],by的长度必须和ascending的长度必须一致,且是一一对应的关系。

 

实验:

第五课 数据分析工具Pandas基础--文本读写操作

In [24]:

 

import pandas as pd
import numpy as np

In [25]:

 

# 文件路径
filepath = r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2016_happiness.csv'

读文件

In [26]:

 

data = pd.read_csv(filepath,usecols=['Country','Region','Happiness Rank','Happiness Score'], index_col='Country')

In [27]:

 

# 数据预览
data.head()

Out[27]:

 RegionHappiness RankHappiness Score
Country   
DenmarkWestern Europe17.526
SwitzerlandWestern Europe27.509
IcelandWestern Europe37.501
NorwayWestern Europe47.498
FinlandWestern Europe57.413

info()查看数据基本信息

In [28]:

 

data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 157 entries, Denmark to Burundi
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Region           157 non-null    object 
 1   Happiness Rank   157 non-null    int64  
 2   Happiness Score  157 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 4.9+ KB

In [29]:

 

# 使用apply函数,生成一列打分的整数部分列(四舍五入)
data['int_score'] = data['Happiness Score'].apply(np.around)

In [30]:

 

# 预处理后的数据
data.head()

Out[30]:

 RegionHappiness RankHappiness Scoreint_score
Country    
DenmarkWestern Europe17.5268.0
SwitzerlandWestern Europe27.5098.0
IcelandWestern Europe37.5018.0
NorwayWestern Europe47.4987.0
FinlandWestern Europe57.4137.0

保存结果

In [32]:

 

 
# 区别index参数的使用
data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_with_index.csv')
data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_no_index.csv',index=False)

In [ ]:

按索引排序

In [33]:

 

 
# 默认 axis=0,表示按行排序
data.sort_index(axis=1).head()

Out[33]:

 Happiness RankHappiness ScoreRegionint_score
Country    
Denmark17.526Western Europe8.0
Switzerland27.509Western Europe8.0
Iceland37.501Western Europe8.0
Norway47.498Western Europe7.0
Finland57.413Western Europe7.0

In [34]:

 

# 按列名排序
data.sort_index(axis=1).head()

Out[34]:

 Happiness RankHappiness ScoreRegionint_score
Country    
Denmark17.526Western Europe8.0
Switzerland27.509Western Europe8.0
Iceland37.501Western Europe8.0
Norway47.498Western Europe7.0
Finland57.413Western Europe7.0

按值排序

In [35]:

 

 
# 单列排序
data.sort_values(by='Country').head(10)

Out[35]:

 RegionHappiness RankHappiness Scoreint_score
Country    
AfghanistanSouthern Asia1543.3603.0
AlbaniaCentral and Eastern Europe1094.6555.0
AlgeriaMiddle East and Northern Africa386.3556.0
AngolaSub-Saharan Africa1413.8664.0
ArgentinaLatin America and Caribbean266.6507.0
ArmeniaCentral and Eastern Europe1214.3604.0
AustraliaAustralia and New Zealand97.3137.0
AustriaWestern Europe127.1197.0
AzerbaijanCentral and Eastern Europe815.2915.0
BahrainMiddle East and Northern Africa426.2186.0

In [36]:

 

 
#按多列排序
data.sort_values(by=['Region','Country']).head(10)

Out[36]:

 RegionHappiness RankHappiness Scoreint_score
Country    
AustraliaAustralia and New Zealand97.3137.0
New ZealandAustralia and New Zealand87.3347.0
AlbaniaCentral and Eastern Europe1094.6555.0
ArmeniaCentral and Eastern Europe1214.3604.0
AzerbaijanCentral and Eastern Europe815.2915.0
BelarusCentral and Eastern Europe615.8026.0
Bosnia and HerzegovinaCentral and Eastern Europe875.1635.0
BulgariaCentral and Eastern Europe1294.2174.0
CroatiaCentral and Eastern Europe745.4885.0
Czech RepublicCentral and Eastern Europe276.5967.0

In [37]:

 

 
# 按多列排序
data.sort_values(by=['Region','Country'],ascending=[True,False]).head(10)

Out[37]:

 RegionHappiness RankHappiness Scoreint_score
Country    
New ZealandAustralia and New Zealand87.3347.0
AustraliaAustralia and New Zealand97.3137.0
UzbekistanCentral and Eastern Europe495.9876.0
UkraineCentral and Eastern Europe1234.3244.0
TurkmenistanCentral and Eastern Europe655.6586.0
TajikistanCentral and Eastern Europe1004.9965.0
SloveniaCentral and Eastern Europe635.7686.0
SlovakiaCentral and Eastern Europe456.0786.0
SerbiaCentral and Eastern Europe865.1775.0
RussiaCentral and Eastern Europe565.8566.0

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值