pandas相关知识笔记

liuwei6843

已于 2024-07-27 12:21:41 修改

阅读量196

点赞数 3

分类专栏： python 文章标签： pandas

于 2024-06-10 20:17:43 首次发布

本文链接：https://blog.csdn.net/liuwei6843/article/details/139579281

版权

python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

文章目录

iloc[]
语法笔记
DataFrame基础知识

iloc[]

df.iloc[ ]通过行和列的索引，来选择 DataFrame 中的数据
df.iloc[0, 1] # 选择第一行第二列的元素 df.iloc[1:3, 1:3]` # 选择第二行到第三行的第二列到第三列的区域

在科学计算中索引数据：

假如cols-1列对应的是100行1列，总共100个数据：
Y1 = data.iloc[:, cols-1]  # 返回一维的series对象，一维的对象没有行和列的概念，就是一个列向量
Y2 = data.iloc[:, [cols-1]]  # 返回二维的dataframe对象，100行*1列
Y1 = np.matrix(Y1.values)  # 返回1*100的矩阵(最开始是100行1列，得到100*1的矩阵才是正确的)
Y2 = np.matrix(Y2.values)  # 返回100*1的矩阵，二维的dataframe对象经过.matrix()之后，数据结构和最开始的100行1列是相同的

在科学计算中，需要将非numpy二维数组格式或非矩阵格式的数据转换为numpy二维数组格式或矩阵格式。
在科学计算中索引数据：
Y1 = data.iloc[:, cols-1]  ❌
Y2 = data.iloc[:, [cols-1]]  √ 

Y3 = Y1.to_frame()  # 将一维的series对象转换为二维的dataframe对象

语法笔记

data[data['Admitted'].isin([0])]  
检查data中Admitted列的数据是否是0，`isin()`返回布尔序列，并将布尔序列作为索引从data中选择数据。
这个操作将返回一个新的DataFrame，其中只包含那些在Admitted列中值为0的行。

pd.read_csv(path, header=None, names=["Population", "Profit"])  
header=None表示文件中没有数据作为列标题行

DataFrame基础知识

import pandas as pd
from typing import List  # 从typing模块中导入List类型，typing模块是Python的类型提示系统的一部分

def createDataframe(data: List[List[int]]) -> pd.DataFrame: 
    result = pd.DataFrame(data, columns=["student_id", "age"], index=["s1", "s2", "s3", "s4"])  # 将列表转换为pd.DataFrame格式，index参数指定行名
    return result

data =[[1,15],           
       [2,11],
       [3,11],
       [4,20]]
result_DataFrame = createDataframe(data)
print(result_DataFrame)
输出：
    student_id  age
s1           1   15
s2           2   11
s3           3   11
s4           4   20

************************************************************************************

# 修改result_DataFrame的index属性
result_DataFrame.index = ["st1", "st2", "st3", "st4"]  
print(result_DataFrame)
输出：
     student_id  age
st1           1   15
st2           2   11
st3           3   11
st4           4   20

# shape
print(result_DataFrame.shape)
输出：
(4, 2)

# 输出result_DataFrame的前两行
print(result_DataFrame[:2])  
输出：
     student_id  age
st1           1   15
st2           2   11

# 确定student_id为2那一行的行索引，选取student_id为2那一行的数据
a = result_DataFrame.loc[result_DataFrame["student_id"]==2, ["student_id","age"]]  
print(a)
输出：
     student_id  age
st2           2   11

# 添加10年后的年龄列；匿名函数lambda x:x+10应用于result_DataFrame["age"]的每一个数据上
result_DataFrame["10_age"] = result_DataFrame["age"].apply(lambda x:x+10)  
print(result_DataFrame)
输出：
     student_id  age  10_age
st1           1   15      25
st2           2   11      21
st3           3   11      21
st4           4   20      30

************************************************************************************

# 添加新行
df.loc[new_index, :] = new_values

# 添加新列
df.loc[:, new_column_name] = new_values
df["new_column_name"] = new_values

# df.loc和df.iloc的区别
df.loc：使用行标签和列标签进行索引
df.iloc：使用行和列的整数位置进行索引