1 ,字段名 : data.columns
- 代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
cols = data.columns
print(cols)
==================================
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
2 ,字段类型,查看 : data.dtypes
- 代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
res = data.dtypes
print(res)
================================
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
3 ,字段类型,修改 :data[“PassengerId”].astype(“object”)
- 代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
print(data.dtypes)
data["PassengerId"] = data["PassengerId"].astype("object")
print(data.dtypes)
=======================================================
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
=======================================================
PassengerId object
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
4 ,案例:字段操作
- 字段名操作 :
1 ,定位 : 找出所有以 “d” 结尾的字段,并且取出。
2 ,操作 : 将这些字段 * 2,得到新的字段
3 ,替换 : 将原字段删除,将新字段放入 - 精华代码 :
new_df = data[new_cols] * 2
res_data = data.drop(new_cols,axis=1)
res_data[["new01","new02","new03"]] = new_df
- 全部代码 :
import numpy as np
import pandas as pd
import pandas.core.frame
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
old_cols = data.columns.tolist()
new_cols = []
for e in old_cols:
if str(e).endswith("d"):
new_cols.append(e)
print(old_cols)
print(new_cols)
new_df = data[new_cols] * 2
res_data = data.drop(new_cols,axis=1)
res_data[["new01","new02","new03"]] = new_df
print(res_data)
================================================================
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
['PassengerId', 'Survived', 'Embarked']
Pclass Name ... new02 new03
0 3 Braund, Mr. Owen Harris ... 0 SS
1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... ... 2 CC
......................
......................
5 ,列计算,列乘 :data[“PassengerId”] * data[“Survived”]
- 不同列之间 :可以做计算,加减乘除
- 代码 :
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
df_new = data[["PassengerId","Survived"]]
df_tow = data["PassengerId"] * data["Survived"]
df_new["tow"] = df_tow
print(df_new)
===================================
PassengerId Survived tow
0 1 0 0
1 2 1 2
2 3 1 3
6 ,最大值 :data[“Age”].max()
- 代码 : 年龄最大的人
if __name__ == '__main__':
data = pd.read_csv("titanic_train.csv")
res = data["Age"].max()
print(res)
=======================
80.0
7 ,最小值 : data[“Age”].min()
8 ,平均值 : data[“Age”].mean()
- 注意,这个平均值,不是 :总和/总数
- 是 : 不算空值
9 ,总和 : data[“Age”].sum()