一:jupyter 中操作 下载 --必须要在anaconda 中先下载好才有用
二:Pandas
简介教程--基础部分
如何读取和写入表格数据? — pandas 2.2.1 文档 - pandas 中文
- 属性代表
DataFrame
/Series
的特征, - 方法是函数,方法(需要括号)则对
DataFrame
/Series
进行操作
列操作:
- 方法
DataFrame上执行此操作,方法是选择 Age列并应用 max() 在Series上操作
In [7]: df["Age"].max() In [8]: ages.max()
Out[7]: 58 Out[8]: 58
df.describe() --数据表中数基本统计信息
titanic.head()
titanic.info()
titanic.tail(10) --将返回 DataFrame 的最后 10 行。
属性
titanic["Age"].shape
DataFrame.shape 是 pandas 的 Series 和 DataFrame 的属性
行操作:选择括号 []
内使用条件函数
- 比大小选行
二者区别: 表格 布尔值
titaIn [12]: above_35 = titanic[titanic["Age"] > 35]
In [13]: above_35.head()
Out[13]:
PassengerId Survived Pclass ... Fare Cabin Embarked
1 2 1 1 ... 71.2833 C85 C
6 7 0 1 ... 51.8625 E46 S
11 12 1 1 ... 26.5500 C103 S
13 14 0 3 ... 31.2750 NaN S
15 16 1 2 ... 16.0000 NaN S
[5 rows x 12 columns]
In [14]: titanic["Age"] > 35
Out[14]:
0 False
1 True
2 False
3 False
4 False
...
886 False
887 False
888 False
889 False
890 False
Name: Age, Length: 891, dtype: bool
- 某列中 按提供的列表 isin() 条件 中的值返回行
In [16]: class_23 = titanic[titanic["Pclass"].isin([2, 3])]
In [17]: class_23.head()
Out[17]:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
2 3 1 3 ... 7.9250 NaN S
4 5 0 3 ... 8.0500 NaN S
5 6 0 3 ... 8.4583 NaN Q
7 8 0 3 ... 21.0750 NaN S
[5 rows x 12 columns]
等同于
In [18]: class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
In [19]: class_23.head()
Out[19]:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
2 3 1 3 ... 7.9250 NaN S
4 5 0 3 ... 8.0500 NaN S
5 6 0 3 ... 8.4583 NaN Q
7 8 0 3 ... 21.0750 NaN S
[5 rows x 12 columns]
- 一次性行列操作:.loc .iloc
In [23]: adult_names = titanic.loc[titanic["Age"] > 35, "Name"]
In [24]: adult_names.head()
Out[24]:
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
6 McCarthy, Mr. Timothy J
11 Bonnell, Miss. Elizabeth
13 Andersson, Mr. Anders Johan
15 Hewlett, Mrs. (Mary D Kingcome)
Name: Name, dtype: object
In [25]: titanic.iloc[9:25, 2:5]
Out[25]:
Pclass Name Sex
9 2 Nasser, Mrs. Nicholas (Adele Achem) female
10 3 Sandstrom, Miss. Marguerite Rut female
11 1 Bonnell, Miss. Elizabeth female
12 3 Saundercock, Mr. William Henry male
13 3 Andersson, Mr. Anders Johan male
.. ... ... ...
20 2 Fynney, Mr. Joseph J male
21 2 Beesley, Mr. Lawrence male
22 3 McGowan, Miss. Anna "Annie" female
23 1 Sloper, Mr. William Thompson male
24 3 Palsson, Miss. Torborg Danira female
[16 rows x 3 columns]
API参考 --所有公开的 pandas 对象、函数和方法