理论:
知识要点
读取数据:pd.read_csv(filepath, usecols, index_col)
filepath: 文件路径
usecols: 指定需要读取的列(默认全部读取)
index_col: 指定某列为索引列,默认会生成一列索引 0, 1, …
df.info():
快速查看数据基本信息
保存数据:df.to_csv(filepath, index)
filepath: 保存的路径
index: 是否将索引列保存,默认为True
按索引排序:sort_index()
ascending:默认为True,True 升序,False 降序
axis:默认为0,0表示按列索引,1表示按行索引
对DataFrame操作时注意轴方向
按值排序:sort_values(by, ascending)
1.按单列的值排序
axis:默认为0,0表示按列索引,1表示按行索引
by=‘label’,by的值选取,与axis的轴方向有关系,当axis=0,表示从列索引的标签中选取标签,axis=1时,表示从行索引的标签中选取标签。
ascending:默认为True,True 升序,False 降序
2.按多列的值排序,by=[ ],ascending = [ ],by的长度必须和ascending的长度必须一致,且是一一对应的关系。
实验:
第五课 数据分析工具Pandas基础--文本读写操作
In [24]:
import pandas as pd
import numpy as np
In [25]:
# 文件路径
filepath = r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2016_happiness.csv'
读文件
In [26]:
data = pd.read_csv(filepath,usecols=['Country','Region','Happiness Rank','Happiness Score'], index_col='Country')
In [27]:
# 数据预览
data.head()
Out[27]:
Region | Happiness Rank | Happiness Score | |
---|---|---|---|
Country | |||
Denmark | Western Europe | 1 | 7.526 |
Switzerland | Western Europe | 2 | 7.509 |
Iceland | Western Europe | 3 | 7.501 |
Norway | Western Europe | 4 | 7.498 |
Finland | Western Europe | 5 | 7.413 |
info()查看数据基本信息
In [28]:
data.info()
<class 'pandas.core.frame.DataFrame'> Index: 157 entries, Denmark to Burundi Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Region 157 non-null object 1 Happiness Rank 157 non-null int64 2 Happiness Score 157 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 4.9+ KB
In [29]:
# 使用apply函数,生成一列打分的整数部分列(四舍五入)
data['int_score'] = data['Happiness Score'].apply(np.around)
In [30]:
# 预处理后的数据
data.head()
Out[30]:
Region | Happiness Rank | Happiness Score | int_score | |
---|---|---|---|---|
Country | ||||
Denmark | Western Europe | 1 | 7.526 | 8.0 |
Switzerland | Western Europe | 2 | 7.509 | 8.0 |
Iceland | Western Europe | 3 | 7.501 | 8.0 |
Norway | Western Europe | 4 | 7.498 | 7.0 |
Finland | Western Europe | 5 | 7.413 | 7.0 |
保存结果
In [32]:
# 区别index参数的使用
data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_with_index.csv')
data.to_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第一节-数据分析工具pandas基础\lesson_05\lesson_05\examples\datasets\2021_no_index.csv',index=False)
In [ ]:
按索引排序
In [33]:
# 默认 axis=0,表示按行排序
data.sort_index(axis=1).head()
Out[33]:
Happiness Rank | Happiness Score | Region | int_score | |
---|---|---|---|---|
Country | ||||
Denmark | 1 | 7.526 | Western Europe | 8.0 |
Switzerland | 2 | 7.509 | Western Europe | 8.0 |
Iceland | 3 | 7.501 | Western Europe | 8.0 |
Norway | 4 | 7.498 | Western Europe | 7.0 |
Finland | 5 | 7.413 | Western Europe | 7.0 |
In [34]:
# 按列名排序
data.sort_index(axis=1).head()
Out[34]:
Happiness Rank | Happiness Score | Region | int_score | |
---|---|---|---|---|
Country | ||||
Denmark | 1 | 7.526 | Western Europe | 8.0 |
Switzerland | 2 | 7.509 | Western Europe | 8.0 |
Iceland | 3 | 7.501 | Western Europe | 8.0 |
Norway | 4 | 7.498 | Western Europe | 7.0 |
Finland | 5 | 7.413 | Western Europe | 7.0 |
按值排序
In [35]:
# 单列排序
data.sort_values(by='Country').head(10)
Out[35]:
Region | Happiness Rank | Happiness Score | int_score | |
---|---|---|---|---|
Country | ||||
Afghanistan | Southern Asia | 154 | 3.360 | 3.0 |
Albania | Central and Eastern Europe | 109 | 4.655 | 5.0 |
Algeria | Middle East and Northern Africa | 38 | 6.355 | 6.0 |
Angola | Sub-Saharan Africa | 141 | 3.866 | 4.0 |
Argentina | Latin America and Caribbean | 26 | 6.650 | 7.0 |
Armenia | Central and Eastern Europe | 121 | 4.360 | 4.0 |
Australia | Australia and New Zealand | 9 | 7.313 | 7.0 |
Austria | Western Europe | 12 | 7.119 | 7.0 |
Azerbaijan | Central and Eastern Europe | 81 | 5.291 | 5.0 |
Bahrain | Middle East and Northern Africa | 42 | 6.218 | 6.0 |
In [36]:
#按多列排序
data.sort_values(by=['Region','Country']).head(10)
Out[36]:
Region | Happiness Rank | Happiness Score | int_score | |
---|---|---|---|---|
Country | ||||
Australia | Australia and New Zealand | 9 | 7.313 | 7.0 |
New Zealand | Australia and New Zealand | 8 | 7.334 | 7.0 |
Albania | Central and Eastern Europe | 109 | 4.655 | 5.0 |
Armenia | Central and Eastern Europe | 121 | 4.360 | 4.0 |
Azerbaijan | Central and Eastern Europe | 81 | 5.291 | 5.0 |
Belarus | Central and Eastern Europe | 61 | 5.802 | 6.0 |
Bosnia and Herzegovina | Central and Eastern Europe | 87 | 5.163 | 5.0 |
Bulgaria | Central and Eastern Europe | 129 | 4.217 | 4.0 |
Croatia | Central and Eastern Europe | 74 | 5.488 | 5.0 |
Czech Republic | Central and Eastern Europe | 27 | 6.596 | 7.0 |
In [37]:
# 按多列排序
data.sort_values(by=['Region','Country'],ascending=[True,False]).head(10)
Out[37]:
Region | Happiness Rank | Happiness Score | int_score | |
---|---|---|---|---|
Country | ||||
New Zealand | Australia and New Zealand | 8 | 7.334 | 7.0 |
Australia | Australia and New Zealand | 9 | 7.313 | 7.0 |
Uzbekistan | Central and Eastern Europe | 49 | 5.987 | 6.0 |
Ukraine | Central and Eastern Europe | 123 | 4.324 | 4.0 |
Turkmenistan | Central and Eastern Europe | 65 | 5.658 | 6.0 |
Tajikistan | Central and Eastern Europe | 100 | 4.996 | 5.0 |
Slovenia | Central and Eastern Europe | 63 | 5.768 | 6.0 |
Slovakia | Central and Eastern Europe | 45 | 6.078 | 6.0 |
Serbia | Central and Eastern Europe | 86 | 5.177 | 5.0 |
Russia | Central and Eastern Europe | 56 | 5.856 | 6.0 |