数据分析工具Pandas基础 --文本读写操作、排序操作

最新推荐文章于 2023-12-06 14:01:14 发布

DB架构

最新推荐文章于 2023-12-06 14:01:14 发布

阅读量517

点赞数

分类专栏： NLP Data Science

本文链接：https://blog.csdn.net/u011868279/article/details/114994942

版权

Data Science 同时被 2 个专栏收录

61 篇文章 2 订阅

订阅专栏

NLP

42 篇文章 0 订阅

订阅专栏

理论：

知识要点

读取数据：pd.read_csv(filepath, usecols, index_col)

filepath: 文件路径

usecols: 指定需要读取的列（默认全部读取）

index_col: 指定某列为索引列，默认会生成一列索引 0, 1, …

df.info()：

快速查看数据基本信息

保存数据：df.to_csv(filepath, index)

filepath: 保存的路径

index: 是否将索引列保存，默认为True

按索引排序：sort_index()

ascending：默认为True，True 升序，False 降序

axis：默认为0，0表示按列索引，1表示按行索引

对DataFrame操作时注意轴方向

按值排序：sort_values(by, ascending)

1.按单列的值排序

axis：默认为0，0表示按列索引，1表示按行索引

by=‘label’，by的值选取，与axis的轴方向有关系，当axis=0，表示从列索引的标签中选取标签，axis=1时，表示从行索引的标签中选取标签。

ascending：默认为True，True 升序，False 降序

2.按多列的值排序，by=[ ]，ascending = [ ]，by的长度必须和ascending的长度必须一致，且是一一对应的关系。

实验：

第五课数据分析工具Pandas基础--文本读写操作

In [24]:

import pandas as pd

import numpy as np

In [25]:

# 文件路径

filepath = r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2016_happiness.csv'

读文件

In [26]:

data = pd.read_csv(filepath,usecols=['Country','Region','Happiness Rank','Happiness Score'], index_col='Country')

In [27]:

# 数据预览

data.head()

Out[27]:

	Region	Happiness Rank	Happiness Score
Country
Denmark	Western Europe	1	7.526
Switzerland	Western Europe	2	7.509
Iceland	Western Europe	3	7.501
Norway	Western Europe	4	7.498
Finland	Western Europe	5	7.413

info()查看数据基本信息

In [28]:

data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 157 entries, Denmark to Burundi
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Region           157 non-null    object 
 1   Happiness Rank   157 non-null    int64  
 2   Happiness Score  157 non-null    float64
dtypes: float64(1), int64(1), object(1)
memory usage: 4.9+ KB

In [29]:

# 使用apply函数，生成一列打分的整数部分列（四舍五入）

data['int_score'] = data['Happiness Score'].apply(np.around)

In [30]:

# 预处理后的数据

data.head()

Out[30]:

	Region	Happiness Rank	Happiness Score	int_score
Country
Denmark	Western Europe	1	7.526	8.0
Switzerland	Western Europe	2	7.509	8.0
Iceland	Western Europe	3	7.501	8.0
Norway	Western Europe	4	7.498	7.0
Finland	Western Europe	5	7.413	7.0

保存结果

In [32]:

# 区别index参数的使用

data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_with_index.csv')

data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_no_index.csv',index=False)

In [ ]:

按索引排序

In [33]:

# 默认 axis=0,表示按行排序

data.sort_index(axis=1).head()

Out[33]:

	Happiness Rank	Happiness Score	Region	int_score
Country
Denmark	1	7.526	Western Europe	8.0
Switzerland	2	7.509	Western Europe	8.0
Iceland	3	7.501	Western Europe	8.0
Norway	4	7.498	Western Europe	7.0
Finland	5	7.413	Western Europe	7.0

In [34]:

# 按列名排序

data.sort_index(axis=1).head()

Out[34]:

	Happiness Rank	Happiness Score	Region	int_score
Country
Denmark	1	7.526	Western Europe	8.0
Switzerland	2	7.509	Western Europe	8.0
Iceland	3	7.501	Western Europe	8.0
Norway	4	7.498	Western Europe	7.0
Finland	5	7.413	Western Europe	7.0

按值排序

In [35]:

# 单列排序

data.sort_values(by='Country').head(10)

Out[35]:

	Region	Happiness Rank	Happiness Score	int_score
Country
Afghanistan	Southern Asia	154	3.360	3.0
Albania	Central and Eastern Europe	109	4.655	5.0
Algeria	Middle East and Northern Africa	38	6.355	6.0
Angola	Sub-Saharan Africa	141	3.866	4.0
Argentina	Latin America and Caribbean	26	6.650	7.0
Armenia	Central and Eastern Europe	121	4.360	4.0
Australia	Australia and New Zealand	9	7.313	7.0
Austria	Western Europe	12	7.119	7.0
Azerbaijan	Central and Eastern Europe	81	5.291	5.0
Bahrain	Middle East and Northern Africa	42	6.218	6.0

In [36]:

#按多列排序

data.sort_values(by=['Region','Country']).head(10)

Out[36]:

	Region	Happiness Rank	Happiness Score	int_score
Country
Australia	Australia and New Zealand	9	7.313	7.0
New Zealand	Australia and New Zealand	8	7.334	7.0
Albania	Central and Eastern Europe	109	4.655	5.0
Armenia	Central and Eastern Europe	121	4.360	4.0
Azerbaijan	Central and Eastern Europe	81	5.291	5.0
Belarus	Central and Eastern Europe	61	5.802	6.0
Bosnia and Herzegovina	Central and Eastern Europe	87	5.163	5.0
Bulgaria	Central and Eastern Europe	129	4.217	4.0
Croatia	Central and Eastern Europe	74	5.488	5.0
Czech Republic	Central and Eastern Europe	27	6.596	7.0

In [37]:

# 按多列排序

data.sort_values(by=['Region','Country'],ascending=[True,False]).head(10)

Out[37]:

	Region	Happiness Rank	Happiness Score	int_score
Country
New Zealand	Australia and New Zealand	8	7.334	7.0
Australia	Australia and New Zealand	9	7.313	7.0
Uzbekistan	Central and Eastern Europe	49	5.987	6.0
Ukraine	Central and Eastern Europe	123	4.324	4.0
Turkmenistan	Central and Eastern Europe	65	5.658	6.0
Tajikistan	Central and Eastern Europe	100	4.996	5.0
Slovenia	Central and Eastern Europe	63	5.768	6.0
Slovakia	Central and Eastern Europe	45	6.078	6.0
Serbia	Central and Eastern Europe	86	5.177	5.0
Russia	Central and Eastern Europe	56	5.856	6.0

DB架构

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据分析工具Pandas基础 --文本读写操作、排序操作

理论：知识要点读取数据：pd.read_csv(filepath, usecols, index_col)filepath: 文件路径usecols: 指定需要读取的列（默认全部读取）index_col: 指定某列为索引列，默认会生成一列索引 0, 1, …df.info()：快速查看数据基本信息保存数据：df.to_csv(filepath, index)filepath: 保存的路径index: 是否将索引列保存，默认为True按索引排序：sort_...
复制链接

扫一扫