1.对数据进行排序
导入库和数据,利用泰坦尼克号的数据作为例子
import numpy as np
import pandas as pd
df=pd.read_csv('train_chinese.csv')
df.head(3)
| 乘客ID | 是否幸存 | 仓位等级 | 姓名 | 性别 | 年龄 | 兄弟姐妹个数 | 父母子女个数 | 船票信息 | 票价 | 客舱 | 登船港口 | Unnamed: 12 |
---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | NaN |
---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | NaN |
---|
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | NaN |
---|
frame=pd.DataFrame(np.arange(8).reshape((2,4)),
index=['2','1'],
columns=['d','a','b','c'])
frame
(1)对某一列排序
对’c’列降序处理
frame.sort_values(by='c',ascending=False)
(2)行索引,列索引排序
frame.sort_index()
frame.sort_index(axis=1)
frame.sort_index(axis=1,ascending=False)
(3)两列数据同时排序
frame.sort_values(by=['a','c'])
对’train_chinese.csv’中的票价和年龄排序
df.sort_values(by=['票价','年龄'],ascending=False).head(20)
| 乘客ID | 是否幸存 | 仓位等级 | 姓名 | 性别 | 年龄 | 兄弟姐妹个数 | 父母子女个数 | 船票信息 | 票价 | 客舱 | 登船港口 | Unnamed: 12 |
---|
679 | 680 | 1 | 1 | Cardeza, Mr. Thomas Drake Martinez | male | 36.0 | 0 | 1 | PC 17755 | 512.3292 | B51 B53 B55 | C | NaN |
---|
258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C | NaN |
---|
737 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C | NaN |
---|
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S | NaN |
---|
341 | 342 | 1 | 1 | Fortune, Miss. Alice Elizabeth | female | 24.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | NaN |
---|
88 | 89 | 1 | 1 | Fortune, Miss. Mabel Helen | female | 23.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | NaN |
---|
27 | 28 | 0 | 1 | Fortune, Mr. Charles Alexander | male | 19.0 | 3 | 2 | 19950 | 263.0000 | C23 C25 C27 | S | NaN |
---|
742 | 743 | 1 | 1 | Ryerson, Miss. Susan Parker "Suzette" | female | 21.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C | NaN |
---|
311 | 312 | 1 | 1 | Ryerson, Miss. Emily Borie | female | 18.0 | 2 | 2 | PC 17608 | 262.3750 | B57 B59 B63 B66 | C | NaN |
---|
299 | 300 | 1 | 1 | Baxter, Mrs. James (Helene DeLaudeniere Chaput) | female | 50.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | NaN |
---|
118 | 119 | 0 | 1 | Baxter, Mr. Quigg Edmond | male | 24.0 | 0 | 1 | PC 17558 | 247.5208 | B58 B60 | C | NaN |
---|
380 | 381 | 1 | 1 | Bidois, Miss. Rosalie | female | 42.0 | 0 | 0 | PC 17757 | 227.5250 | NaN | C | NaN |
---|
716 | 717 | 1 | 1 | Endres, Miss. Caroline Louise | female | 38.0 | 0 | 0 | PC 17757 | 227.5250 | C45 | C | NaN |
---|
700 | 701 | 1 | 1 | Astor, Mrs. John Jacob (Madeleine Talmadge Force) | female | 18.0 | 1 | 0 | PC 17757 | 227.5250 | C62 C64 | C | NaN |
---|
557 | 558 | 0 | 1 | Robbins, Mr. Victor | male | NaN | 0 | 0 | PC 17757 | 227.5250 | NaN | C | NaN |
---|
527 | 528 | 0 | 1 | Farthing, Mr. John | male | NaN | 0 | 0 | PC 17483 | 221.7792 | C95 | S | NaN |
---|
377 | 378 | 0 | 1 | Widener, Mr. Harry Elkins | male | 27.0 | 0 | 2 | 113503 | 211.5000 | C82 | C | NaN |
---|
779 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton Mc... | female | 43.0 | 0 | 1 | 24160 | 211.3375 | B3 | S | NaN |
---|
730 | 731 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0 | 0 | 0 | 24160 | 211.3375 | B5 | S | NaN |
---|
689 | 690 | 1 | 1 | Madill, Miss. Georgette Alexandra | female | 15.0 | 0 | 1 | 24160 | 211.3375 | B5 | S | NaN |
---|
df.sort_values(by=['票价','年龄'],ascending=False).tail(20)
| 乘客ID | 是否幸存 | 仓位等级 | 姓名 | 性别 | 年龄 | 兄弟姐妹个数 | 父母子女个数 | 船票信息 | 票价 | 客舱 | 登船港口 | Unnamed: 12 |
---|
818 | 819 | 0 | 3 | Holm, Mr. John Fredrik Alexander | male | 43.0 | 0 | 0 | C 7075 | 6.4500 | NaN | S | NaN |
---|
843 | 844 | 0 | 3 | Lemberopolous, Mr. Peter L | male | 34.5 | 0 | 0 | 2683 | 6.4375 | NaN | C | NaN |
---|
326 | 327 | 0 | 3 | Nysveen, Mr. Johan Hansen | male | 61.0 | 0 | 0 | 345364 | 6.2375 | NaN | S | NaN |
---|
872 | 873 | 0 | 1 | Carlsson, Mr. Frans Olof | male | 33.0 | 0 | 0 | 695 | 5.0000 | B51 B53 B55 | S | NaN |
---|
378 | 379 | 0 | 3 | Betros, Mr. Tannous | male | 20.0 | 0 | 0 | 2648 | 4.0125 | NaN | C | NaN |
---|
597 | 598 | 0 | 3 | Johnson, Mr. Alfred | male | 49.0 | 0 | 0 | LINE | 0.0000 | NaN | S | NaN |
---|
263 | 264 | 0 | 1 | Harrison, Mr. William | male | 40.0 | 0 | 0 | 112059 | 0.0000 | B94 | S | NaN |
---|
806 | 807 | 0 | 1 | Andrews, Mr. Thomas Jr | male | 39.0 | 0 | 0 | 112050 | 0.0000 | A36 | S | NaN |
---|
822 | 823 | 0 | 1 | Reuchlin, Jonkheer. John George | male | 38.0 | 0 | 0 | 19972 | 0.0000 | NaN | S | NaN |
---|
179 | 180 | 0 | 3 | Leonard, Mr. Lionel | male | 36.0 | 0 | 0 | LINE | 0.0000 | NaN | S | NaN |
---|
271 | 272 | 1 | 3 | Tornquist, Mr. William Henry | male | 25.0 | 0 | 0 | LINE | 0.0000 | NaN | S | NaN |
---|
302 | 303 | 0 | 3 | Johnson, Mr. William Cahoone Jr | male | 19.0 | 0 | 0 | LINE | 0.0000 | NaN | S | NaN |
---|
277 | 278 | 0 | 2 | Parkes, Mr. Francis "Frank" | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S | NaN |
---|
413 | 414 | 0 | 2 | Cunningham, Mr. Alfred Fleming | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S | NaN |
---|
466 | 467 | 0 | 2 | Campbell, Mr. William | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S | NaN |
---|
481 | 482 | 0 | 2 | Frost, Mr. Anthony Wood "Archie" | male | NaN | 0 | 0 | 239854 | 0.0000 | NaN | S | NaN |
---|
633 | 634 | 0 | 1 | Parr, Mr. William Henry Marsh | male | NaN | 0 | 0 | 112052 | 0.0000 | NaN | S | NaN |
---|
674 | 675 | 0 | 2 | Watson, Mr. Ennis Hastings | male | NaN | 0 | 0 | 239856 | 0.0000 | NaN | S | NaN |
---|
732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN | 0 | 0 | 239855 | 0.0000 | NaN | S | NaN |
---|
815 | 816 | 0 | 1 | Fry, Mr. Richard | male | NaN | 0 | 0 | 112058 | 0.0000 | B102 | S | NaN |
---|
可以从上面关于票价和年龄的排序看出票价最高的数据中,20个人有14个生还,而票价最低的数据中,只有一个人生还,可以从一定程度说明票价和生还几率有一定的关系
2.两个DataFrame数据相加
a=pd.DataFrame(np.arange(4.).reshape(2,2),
columns=['a','b'],
index=['1','2'])
b=pd.DataFrame(np.arange(9.).reshape(3,3),
columns=['a','b','c'],
index=['1','2','3'])
a
b
| a | b | c |
---|
1 | 0.0 | 1.0 | 2.0 |
---|
2 | 3.0 | 4.0 | 5.0 |
---|
3 | 6.0 | 7.0 | 8.0 |
---|
a+b
| a | b | c |
---|
1 | 0.0 | 2.0 | NaN |
---|
2 | 5.0 | 7.0 | NaN |
---|
3 | NaN | NaN | NaN |
---|
计算船上最大的家族的人口数
max(df['兄弟姐妹个数']+df['父母子女个数'])
10
得出船上人数最大的家族人数为10
3.查看数据基本统计信息
利用descible函数查看票价,年龄的基本统计信息
df['票价'].describe()
count 891.000000
mean 32.204208
std 49.693429
min 0.000000
25% 7.910400
50% 14.454200
75% 31.000000
max 512.329200
Name: 票价, dtype: float64
df['年龄'].describe()
count 714.000000
mean 29.699118
std 14.526497
min 0.420000
25% 20.125000
50% 28.000000
75% 38.000000
max 80.000000
Name: 年龄, dtype: float64
可以得出年龄和票价的数据量,最大值,最小值,分位数,平均值,标准差