python学习（二）——Pandas篇（1）

最新推荐文章于 2021-10-15 00:34:37 发布

Snowy_susu

最新推荐文章于 2021-10-15 00:34:37 发布

阅读量252

点赞数

分类专栏： python学习笔记文章标签： Python Pandas

本文链接：https://blog.csdn.net/snowy_susu/article/details/83717864

版权

python学习笔记专栏收录该内容

5 篇文章 1 订阅

订阅专栏

Pandas 一个数据分析处理的库，基于Python 底层是基于numpy的，Pandas的核心结构是DataFrame。此篇简单学习了Pandas的基础操作，主要包括对.csv文件的读取（pandas.read_csv（“path”））；读取数据的前几行（.head()）或者后几行(.tail())；显示数据的列名也即是抬头（.columns()）以及显示数据的大小，有多少行多少列（.shape），以及取某一行或ji'hang几行的数据（.loc[]）; 根据某些条件筛选某些列的信息；对数据的列与列之间做一些数据处理；获得某一列的最大值等；最后是对数据的排序（.sort_values("Energ_Kcal",inplace=True)降序，.sort_values("Energ_Kcal",inplace=True,ascending=False)升序）。

E:\python_code_test\food_info.csv

具体代码如下：

# 数据分析处理库 Pandas
# Pandas 封装了很多函数，是在numpy基础之上，底层是基于numpy的。
# 数据读取 读.csv文件    pandas.read_csv('')
import pandas
food_info = pandas.read_csv('E:\\python_code_test\\food_info.csv')
print(type(food_info)) # DataFrame pandas的核心结构
print(food_info.dtypes) # 表值的类型 int64 float64 object(相当于string型)
# print(help(pandas.read_csv)) # 可以输出相应函数的帮助文档，更多的了解函数

# 从前往后取或者从后往前取数据
print(food_info.head()) # 在table中显示前五行数据包括抬头
print(food_info.tail()) # 显示尾几行[5 rows x 36 columns]
print(food_info.tail(3)) # [3 rows x 36 columns]  

# 显示列名 即抬头
print(food_info.columns)
print(food_info.shape) # (8618, 36) # 8618组数据，每组数据含有36个属性

# 取数据 .loc[] 相当于一个index
print(food_info.loc[0]) # 取第一行的数据
# NDB_No                         1001
# Shrt_Desc          BUTTER WITH SALT
# Water_(g)                     15.87
# Energ_Kcal                      717
# ……

#…………………………………………………………… 取数据 ……………………………………………………………………………………………………………

# 通过切片取数据
print(food_info.loc[3:6]) # 取第三行到第六行的整行数据 [3,6]
# 整列整列的取数据
get_col = food_info["Water_(g)"] # 根据抬头定位到某一列
print(get_col) # Name: Water_(g), Length: 8618, dtype: float64
# 或通过变量定位
# col_name = "Water_(g)"
# get_col = food_info[col_name] 

two_col_name = ["Energ_Kcal","Water_(g)"]
get_two_col = food_info[two_col_name] 
print(get_two_col) # 打印出两列的指标 [8618 rows x 2 columns]

#…………………………………………………………………… 根本条件进行查找 ………………………………………………………………………………

# 查找.csv文件中那些参数以及列名是以(g)为结尾的列
col_names = food_info.columns.tolist() # 将列名保存为list
gram_columns = [] # 空的
# 条件筛选获得end为(g)的列名 并复制给gram_columns
for c in col_names:
	if c.endswith("(g)"):  
		gram_columns.append(c)
gram_df = food_info[gram_columns] # 获得符合筛选列名条件的列名所在列
print(gram_df.head(3))
#    Water_(g)  Protein_(g)     ...       FA_Mono_(g)  FA_Poly_(g)
# 0      15.87         0.85     ...            21.021        3.043
# 1      15.87         0.85     ...            23.426        3.012
# 2       0.24         0.28     ...            28.732        3.694

# ……………………………………………………………  做一些加减乘除的操作 ………………………………………………………………………
div_1000 = food_info["Cholestrl_(mg)"]/1000
print(div_1000) # 相当于把mg转化成了g

# 对两个列做一些组合 维度一样时，对应位置进行相应的操作
Water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]

# 新增一列 并命名 Iron_(g)
print(food_info.shape) # (8618, 36)
Iron_gram = food_info["Iron_(mg)"]/1000
food_info["Iron_(g)"] = Iron_gram
print(food_info.shape) # (8618, 37)

# 取某一列最大值
max_calories = food_info["Energ_Kcal"].max()
print(max_calories) # 902
# 归一化
normalized_calories = food_info["Energ_Kcal"] / max_calories
normalized_fat = food_info["Lipid_Tot_(g)"]/food_info["Lipid_Tot_(g)"].max()
food_info["Normalized_Fat"] = normalized_fat
print(food_info.shape) # (8618, 38)

#…………………………………………………………………… 做排序 …………………………………………………………………………………………

food_info.sort_values("Energ_Kcal",inplace=True) # 从小到达排序 升序
print(food_info["Energ_Kcal"]) 
food_info.sort_values("Energ_Kcal",inplace=True,ascending=False) # 从大到小排序 降序
print(food_info["Energ_Kcal"])

Snowy_susu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python学习（二）——Pandas篇（1）

Pandas 一个数据分析处理的库，基于Python 底层是基于numpy的，Pandas的核心结构是DataFrame。此篇简单学习了Pandas的基础操作，主要包括对.csv文件的读取（pandas.read_csv（“path”））；读取数据的前几行（.head()）或者后几行(.tail())；显示数据的列名也即是抬头（.columns()）以及显示数据的大小，有多少行多少列（.s...
复制链接

扫一扫