python-数据分析-pandas

4.1 pandas及其数据结构

4.1.2Series数据结构及其创建

第一种:通过标量创建Series

import pandas as pd 
s1=pd.Series(62)
s1
0    62
dtype: int64
import pandas as pd 
s1=pd.Series(62,index=["x","y","z"])
s1
x    62
y    62
z    62
dtype: int64

第二种:通过列表创建Series

import pandas as pd 
s2=pd.Series([30,10,60],index=["x","y","z"])
s2

x    30
y    10
z    60
dtype: int64

第三种:通过字典创建Series

import pandas as pd
s3=pd.Series({"匪警":110,"火警":119,"急救中心":120,"交通事故":122})
s3
匪警        110
火警        119
急救中心    120
交通事故    122
dtype: int64

第四种:通过ndarray创建Series

import pandas as pd
import numpy as np
s4=pd.Series(np.arange(6),index=["a","b","c","d","e","f"])
s4
a    0
b    1
c    2
d    3
e    4
f    5
dtype: int32

values和index

import pandas as pd
s3=pd.Series({"匪警":110,"火警":119,"急救中心":120,"交通事故":122})
s3.index
s3.values
array([110, 119, 120, 122], dtype=int64)

索引和切片

import pandas as pd 
s2=pd.Series([30,10,60],index=["x","y","z"])
s2["x"]
30
s2[0]
30
s2[:2]
x    30
y    10
dtype: int64

4.1.3DataFrame数据结构及其创建

第一种:通过一维列表构成的字典创建DataFrame

import pandas as pd
d1={"姓名":["张三","李四","王五","赵六"],"数学":[87,45,34,98],"语文":[54,76,55,90],"计算机":[34,56,77,87]}
df1=pd.DataFrame(d1)
df1
姓名数学语文计算机
0张三875434
1李四457656
2王五345577
3赵六989087
import pandas as pd
d1={"姓名":["张三","李四","王五","赵六"],"数学":[87,45,34,98],"语文":[54,76,55,90],"计算机":[34,56,77,87]}
df1=pd.DataFrame(d1,index=[101,102,103,104])
df1
姓名数学语文计算机
101张三875434
102李四457656
103王五345577
104赵六989087

通过二维ndarray创建DataFrame

import pandas as pd
import numpy as np
nd1=np.arange(12).reshape(3,4)
nd1
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
df2=pd.DataFrame(nd1)
df2
0123
00123
14567
2891011
df2=pd.DataFrame(nd1,index=["a","b","c"])
df2
0123
a0123
b4567
c891011

values,index和columns三部分

import pandas as pd
d1={"姓名":["张三","李四","王五","赵六"],"数学":[87,45,34,98],"语文":[54,76,55,90],"计算机":[34,56,77,87]}
df1=pd.DataFrame(d1,index=[202201,202202,202203,202204])
df1
姓名数学语文计算机
202201张三875434
202202李四457656
202203王五345577
202204赵六989087
df1.columns
Index(['姓名', '数学', '语文', '计算机'], dtype='object')
df1.index
Int64Index([202201, 202202, 202203, 202204], dtype='int64')
df1.values
array([['张三', 87, 54, 34],
       ['李四', 45, 76, 56],
       ['王五', 34, 55, 77],
       ['赵六', 98, 90, 87]], dtype=object)

4.2利用pandas导入导出数据

导入外部数据

import pandas as pd
f1=pd.read_csv("C:\\Users\\wsy\\Desktop\\a.csv")
f1
ab
012
124
236
348
4510
5612
6714
7816
8918
import pandas as pd
f1=pd.read_csv("C:\\Users\\wsy\\Desktop\\b.csv",encoding="gbk")
f1
青海西宁
012
124
236
348
4510
5612
6714
7816
8918

导出外部数据

4.3数据概览及预处理

import pandas as pd
pd.set_option("display.unicode.east_asian_width",True)#解决数据输出时列名不对齐的问题
df=pd.read_excel("C:\\Users\\wsy\\Desktop\\cj.xlsx")
df
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
202020844022关帅NaN会计学90.00000092.675.00100.088.00
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00
222020844024刘浩天会计学85.10000083.285.00100.098.00
232020844025刘宇NaN75.20000085.676.00100.089.00
242020844026胡童会计学84.05000086.091.00100.0119.00
252020844027丁灿会计学88.75000086.266.00100.079.00
262020844028郑武田会计学89.55000087.491.00NaN104.00
272020844029金耀会计学79.45000087.268.00100.081.00
282020844030庞博会计学89.70000092.092.00100.0105.00
292020848001王春杨会计学88.10000089.884.00100.097.00
302020848002陈小恬会计学83.75000094.889.00100.0102.00
312020848002陈小恬会计学83.75000094.889.00100.0102.00
322020848002陈小恬会计学83.75000094.889.00100.0102.00
332020848003张淳会计学91.30000092.281.32100.094.32
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32
422020848014贾晶晶NaN84.45000093.082.66100.095.66
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00
452020848017赵迎辰NaN信息管理与信息系统82.25000087.474.00100.087.00
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
472020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
482020848019张雨桐金融学79.15000092.483.00100.096.00
492020848020孟德坤金融学83.45000087.480.66100.093.66
502020848021王少祖金融学82.95000091.678.0090.091.00
512020848023黄金雨金融学79.95000089.886.00100.099.00
522020848024汤佳怡金融学86.60000083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32
552020848029林可新金融学89.30000087.495.00100.0108.00
562020848031任旭金融学83.42500085.471.66100.084.66

4.3.1数据概览分析

利用基础属性查看数据基本信息

print("索引:",df.index)
索引: RangeIndex(start=0, stop=57, step=1)
print("列名:",df.columns)
列名: Index(['学号', '姓名', '性别', '专业', '英语', '数学', 'Python', '选修',
       '管理学'], dtype='object')
print("数据元素:",df.values[:10])
数据元素: [[2020802045 '魏天' '男' '信息管理与信息系统' 67.11666666666667 90.80000000000001
  93.0 95.0 106.0]
 [2020844001 '郭夏' '男' '国际贸易' 91.05 83.4 86.0 100.0 99.0]
 [2020844002 '王晓加' '男' nan 54.2 83.4 74.0 nan 90.0]
 [2020844003 '黄婷婷' '女' '国际贸易' 87.8 91.4 79.66 95.0 92.66]
 [2020844004 '赵小瑜' nan '国际贸易' 61.15 82.2 84.66 100.0 97.66]
 [2020844005 '辛禧' '男' '国际贸易' 65.125 88.6 68.0 80.0 81.0]
 [2020844007 '王晨' '男' '国际贸易' 62.4 80.0 65.0 90.0 78.0]
 [2020844008 '韩天' '男' '国际贸易' 96.25 91.0 85.0 97.0 98.0]
 [2020844009 '刘玉' '女' '国际贸易' 89.05 91.4 80.32 100.0 93.32]
 [2020844010 '谢亚鹏' '男' '市场营销' 70.5 85.2 60.0 90.0 73.0]]
print("数据类型:\n",df.dtypes)
数据类型:
 学号        int64
姓名       object
性别       object
专业       object
英语      float64
数学      float64
Python    float64
选修      float64
管理学    float64
dtype: object

利用基础属性查看数据规模

print("元素个数:",df.size)
元素个数: 513
print("维度数:",df.ndim)
维度数: 2
print("形状:",df.shape)
形状: (57, 9)
print("行数:",df.index.size)
行数: 57
print("列数",df.columns.size)
列数 9

利用常用方法查看样本数据

df.head()
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
df.head(2)
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.095.0106.0
12020844001郭夏国际贸易91.05000083.486.0100.099.0
df.tail()
学号姓名性别专业英语数学Python选修管理学
522020848024汤佳怡金融学86.60083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00075.063.32100.076.32
552020848029林可新金融学89.30087.495.00100.0108.00
562020848031任旭金融学83.42585.471.66100.084.66
df.tail(3)
学号姓名性别专业英语数学Python选修管理学
542020848028奴热艾力·雪艾力金融学15.00075.063.32100.076.32
552020848029林可新金融学89.30087.495.00100.0108.00
562020848031任旭金融学83.42585.471.66100.084.66

利用常用方法查看数据质量

print(df.notnull())#查看数据的缺失值情况
    学号  姓名   性别   专业  英语  数学  Python   选修  管理学
0   True  True   True   True  True  True    True   True    True
1   True  True   True   True  True  True    True   True    True
2   True  True   True  False  True  True    True  False    True
3   True  True   True   True  True  True    True   True    True
4   True  True  False   True  True  True    True   True    True
5   True  True   True   True  True  True    True   True    True
6   True  True   True   True  True  True    True   True    True
7   True  True   True   True  True  True    True   True    True
8   True  True   True   True  True  True    True   True    True
9   True  True   True   True  True  True    True   True    True
10  True  True   True   True  True  True    True  False    True
11  True  True   True   True  True  True    True   True    True
12  True  True   True   True  True  True    True   True    True
13  True  True   True   True  True  True    True   True    True
14  True  True   True   True  True  True    True   True    True
15  True  True   True   True  True  True    True   True    True
16  True  True   True   True  True  True    True   True    True
17  True  True   True   True  True  True    True   True    True
18  True  True   True   True  True  True    True   True    True
19  True  True   True   True  True  True    True   True    True
20  True  True  False   True  True  True    True   True    True
21  True  True   True   True  True  True    True   True    True
22  True  True   True   True  True  True    True   True    True
23  True  True   True  False  True  True    True   True    True
24  True  True   True   True  True  True    True   True    True
25  True  True   True   True  True  True    True   True    True
26  True  True   True   True  True  True    True  False    True
27  True  True   True   True  True  True    True   True    True
28  True  True   True   True  True  True    True   True    True
29  True  True   True   True  True  True    True   True    True
30  True  True   True   True  True  True    True   True    True
31  True  True   True   True  True  True    True   True    True
32  True  True   True   True  True  True    True   True    True
33  True  True   True   True  True  True    True   True    True
34  True  True   True   True  True  True    True   True    True
35  True  True   True   True  True  True    True   True    True
36  True  True   True   True  True  True    True   True    True
37  True  True   True   True  True  True    True   True    True
38  True  True   True   True  True  True    True   True    True
39  True  True   True   True  True  True    True   True    True
40  True  True   True   True  True  True    True   True    True
41  True  True   True   True  True  True    True   True    True
42  True  True   True  False  True  True    True   True    True
43  True  True   True   True  True  True    True   True    True
44  True  True   True   True  True  True    True  False    True
45  True  True  False   True  True  True    True   True    True
46  True  True   True   True  True  True    True   True    True
47  True  True   True   True  True  True    True   True    True
48  True  True   True   True  True  True    True   True    True
49  True  True   True   True  True  True    True   True    True
50  True  True   True   True  True  True    True   True    True
51  True  True   True   True  True  True    True   True    True
52  True  True   True   True  True  True    True   True    True
53  True  True   True   True  True  True    True   True    True
54  True  True   True   True  True  True    True   True    True
55  True  True   True   True  True  True    True   True    True
56  True  True   True   True  True  True    True   True    True
print(df.isnull())#isna()是isnull()的别名
     学号   姓名   性别   专业   英语   数学  Python   选修  管理学
0   False  False  False  False  False  False   False  False   False
1   False  False  False  False  False  False   False  False   False
2   False  False  False   True  False  False   False   True   False
3   False  False  False  False  False  False   False  False   False
4   False  False   True  False  False  False   False  False   False
5   False  False  False  False  False  False   False  False   False
6   False  False  False  False  False  False   False  False   False
7   False  False  False  False  False  False   False  False   False
8   False  False  False  False  False  False   False  False   False
9   False  False  False  False  False  False   False  False   False
10  False  False  False  False  False  False   False   True   False
11  False  False  False  False  False  False   False  False   False
12  False  False  False  False  False  False   False  False   False
13  False  False  False  False  False  False   False  False   False
14  False  False  False  False  False  False   False  False   False
15  False  False  False  False  False  False   False  False   False
16  False  False  False  False  False  False   False  False   False
17  False  False  False  False  False  False   False  False   False
18  False  False  False  False  False  False   False  False   False
19  False  False  False  False  False  False   False  False   False
20  False  False   True  False  False  False   False  False   False
21  False  False  False  False  False  False   False  False   False
22  False  False  False  False  False  False   False  False   False
23  False  False  False   True  False  False   False  False   False
24  False  False  False  False  False  False   False  False   False
25  False  False  False  False  False  False   False  False   False
26  False  False  False  False  False  False   False   True   False
27  False  False  False  False  False  False   False  False   False
28  False  False  False  False  False  False   False  False   False
29  False  False  False  False  False  False   False  False   False
30  False  False  False  False  False  False   False  False   False
31  False  False  False  False  False  False   False  False   False
32  False  False  False  False  False  False   False  False   False
33  False  False  False  False  False  False   False  False   False
34  False  False  False  False  False  False   False  False   False
35  False  False  False  False  False  False   False  False   False
36  False  False  False  False  False  False   False  False   False
37  False  False  False  False  False  False   False  False   False
38  False  False  False  False  False  False   False  False   False
39  False  False  False  False  False  False   False  False   False
40  False  False  False  False  False  False   False  False   False
41  False  False  False  False  False  False   False  False   False
42  False  False  False   True  False  False   False  False   False
43  False  False  False  False  False  False   False  False   False
44  False  False  False  False  False  False   False   True   False
45  False  False   True  False  False  False   False  False   False
46  False  False  False  False  False  False   False  False   False
47  False  False  False  False  False  False   False  False   False
48  False  False  False  False  False  False   False  False   False
49  False  False  False  False  False  False   False  False   False
50  False  False  False  False  False  False   False  False   False
51  False  False  False  False  False  False   False  False   False
52  False  False  False  False  False  False   False  False   False
53  False  False  False  False  False  False   False  False   False
54  False  False  False  False  False  False   False  False   False
55  False  False  False  False  False  False   False  False   False
56  False  False  False  False  False  False   False  False   False
print("df中每个特征的缺失情况:\n",df.isna().sum())
df中每个特征的缺失情况:
 学号      0
姓名      0
性别      3
专业      3
英语      0
数学      0
Python    0
选修      4
管理学    0
dtype: int64
#判断数据中是否有重复的
df.duplicated()
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31     True
32     True
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
41    False
42    False
43    False
44    False
45    False
46    False
47     True
48    False
49    False
50    False
51    False
52    False
53    False
54    False
55    False
56    False
dtype: bool
#判断指定列中是否有重复的
df.duplicated("姓名")
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31     True
32     True
33    False
34    False
35    False
36    False
37    False
38    False
39    False
40    False
41    False
42    False
43    False
44    False
45    False
46    False
47     True
48    False
49    False
50    False
51    False
52    False
53    False
54    False
55    False
56    False
dtype: bool
df.info()#给出样本数据的相关信息概览 :行数,列数,列索引,列非空值个数,列类型,内存占用
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 9 columns):
学号        57 non-null int64
姓名        57 non-null object
性别        54 non-null object
专业        54 non-null object
英语        57 non-null float64
数学        57 non-null float64
Python    57 non-null float64
选修        53 non-null float64
管理学       57 non-null float64
dtypes: float64(5), int64(1), object(3)
memory usage: 4.1+ KB

4.3.2数据清洗

import pandas as pd
pd.set_option("display.unicode.east_asian_width",True)#解决数据输出时列名不对齐的问题
df=pd.read_excel("C:\\Users\\wsy\\Desktop\\cj.xlsx")
df
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
202020844022关帅NaN会计学90.00000092.675.00100.088.00
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00
222020844024刘浩天会计学85.10000083.285.00100.098.00
232020844025刘宇NaN75.20000085.676.00100.089.00
242020844026胡童会计学84.05000086.091.00100.0119.00
252020844027丁灿会计学88.75000086.266.00100.079.00
262020844028郑武田会计学89.55000087.491.00NaN104.00
272020844029金耀会计学79.45000087.268.00100.081.00
282020844030庞博会计学89.70000092.092.00100.0105.00
292020848001王春杨会计学88.10000089.884.00100.097.00
302020848002陈小恬会计学83.75000094.889.00100.0102.00
312020848002陈小恬会计学83.75000094.889.00100.0102.00
322020848002陈小恬会计学83.75000094.889.00100.0102.00
332020848003张淳会计学91.30000092.281.32100.094.32
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32
422020848014贾晶晶NaN84.45000093.082.66100.095.66
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00
452020848017赵迎辰NaN信息管理与信息系统82.25000087.474.00100.087.00
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
472020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
482020848019张雨桐金融学79.15000092.483.00100.096.00
492020848020孟德坤金融学83.45000087.480.66100.093.66
502020848021王少祖金融学82.95000091.678.0090.091.00
512020848023黄金雨金融学79.95000089.886.00100.099.00
522020848024汤佳怡金融学86.60000083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32
552020848029林可新金融学89.30000087.495.00100.0108.00
562020848031任旭金融学83.42500085.471.66100.084.66

缺失值处理

#存在任一缺失值即删除
df1=df.dropna()
print("删出前",df.shape)
print("删出后",df1.shape)

删出前 (57, 9)
删出后 (48, 9)
#所有列均为缺失值即删除
df1=df.dropna(how="all")
print("删出前",df.shape)
print("删出后",df1.shape)
删出前 (57, 9)
删出后 (57, 9)
#指定列均为缺失值即删除
df1=df.dropna(how="all",subset=["专业","选修"])
print("删出前",df.shape)
print("删出后",df1.shape)
删出前 (57, 9)
删出后 (56, 9)
#保留某些属性不存在缺失值的情况
df1=df[df["性别"].notnull()]
print("删出前",df.shape)
print("删出后",df1.shape)
df1
删出前 (57, 9)
删出后 (54, 9)
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00
222020844024刘浩天会计学85.10000083.285.00100.098.00
232020844025刘宇NaN75.20000085.676.00100.089.00
242020844026胡童会计学84.05000086.091.00100.0119.00
252020844027丁灿会计学88.75000086.266.00100.079.00
262020844028郑武田会计学89.55000087.491.00NaN104.00
272020844029金耀会计学79.45000087.268.00100.081.00
282020844030庞博会计学89.70000092.092.00100.0105.00
292020848001王春杨会计学88.10000089.884.00100.097.00
302020848002陈小恬会计学83.75000094.889.00100.0102.00
312020848002陈小恬会计学83.75000094.889.00100.0102.00
322020848002陈小恬会计学83.75000094.889.00100.0102.00
332020848003张淳会计学91.30000092.281.32100.094.32
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32
422020848014贾晶晶NaN84.45000093.082.66100.095.66
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
472020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
482020848019张雨桐金融学79.15000092.483.00100.096.00
492020848020孟德坤金融学83.45000087.480.66100.093.66
502020848021王少祖金融学82.95000091.678.0090.091.00
512020848023黄金雨金融学79.95000089.886.00100.099.00
522020848024汤佳怡金融学86.60000083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32
552020848029林可新金融学89.30000087.495.00100.0108.00
562020848031任旭金融学83.42500085.471.66100.084.66
#将缺失值NaN填充为0
df["选修"].fillna(0)
0      95.0
1     100.0
2       0.0
3      95.0
4     100.0
5      80.0
6      90.0
7      97.0
8     100.0
9      90.0
10      0.0
11    100.0
12    100.0
13     80.0
14    100.0
15     95.0
16    100.0
17    100.0
18    100.0
19     87.0
20    100.0
21    100.0
22    100.0
23    100.0
24    100.0
25    100.0
26      0.0
27    100.0
28    100.0
29    100.0
30    100.0
31    100.0
32    100.0
33    100.0
34    100.0
35    100.0
36    100.0
37     68.0
38    100.0
39    100.0
40     77.0
41    100.0
42    100.0
43    100.0
44      0.0
45    100.0
46     90.0
47     90.0
48    100.0
49    100.0
50     90.0
51    100.0
52    100.0
53    100.0
54    100.0
55    100.0
56    100.0
Name: 选修, dtype: float64
#将缺失值NaN填充与后面的值相同
df["选修"].fillna(method="ffill")
0      95.0
1     100.0
2      95.0
3      95.0
4     100.0
5      80.0
6      90.0
7      97.0
8     100.0
9      90.0
10    100.0
11    100.0
12    100.0
13     80.0
14    100.0
15     95.0
16    100.0
17    100.0
18    100.0
19     87.0
20    100.0
21    100.0
22    100.0
23    100.0
24    100.0
25    100.0
26    100.0
27    100.0
28    100.0
29    100.0
30    100.0
31    100.0
32    100.0
33    100.0
34    100.0
35    100.0
36    100.0
37     68.0
38    100.0
39    100.0
40     77.0
41    100.0
42    100.0
43    100.0
44    100.0
45    100.0
46     90.0
47     90.0
48    100.0
49    100.0
50     90.0
51    100.0
52    100.0
53    100.0
54    100.0
55    100.0
56    100.0
Name: 选修, dtype: float64
import numpy as np
#将缺失值NaN填充选修课的平均分
df["选修"].fillna(np.mean(df["选修"]))
0      95.000000
1     100.000000
2      96.679245
3      95.000000
4     100.000000
5      80.000000
6      90.000000
7      97.000000
8     100.000000
9      90.000000
10     96.679245
11    100.000000
12    100.000000
13     80.000000
14    100.000000
15     95.000000
16    100.000000
17    100.000000
18    100.000000
19     87.000000
20    100.000000
21    100.000000
22    100.000000
23    100.000000
24    100.000000
25    100.000000
26     96.679245
27    100.000000
28    100.000000
29    100.000000
30    100.000000
31    100.000000
32    100.000000
33    100.000000
34    100.000000
35    100.000000
36    100.000000
37     68.000000
38    100.000000
39    100.000000
40     77.000000
41    100.000000
42    100.000000
43    100.000000
44     96.679245
45    100.000000
46     90.000000
47     90.000000
48    100.000000
49    100.000000
50     90.000000
51    100.000000
52    100.000000
53    100.000000
54    100.000000
55    100.000000
56    100.000000
Name: 选修, dtype: float64

重复值处理

#去除全部重复数据
df1=df.drop_duplicates()
print("去重前:",df.shape)
print("去重后:",df1.shape)
去重前: (57, 9)
去重后: (54, 9)
#去除指定列中重复数据
df1=df.drop_duplicates(["专业"])
print("去重前:",df.shape)
print("去重后:",df1.shape)
df1
去重前: (57, 9)
去重后: (6, 9)
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.095.0106.0
12020844001郭夏国际贸易91.05000083.486.0100.099.0
22020844002王晓加NaN54.20000083.474.0NaN90.0
92020844010谢亚鹏市场营销70.50000085.260.090.073.0
192020844021李赫桐会计学88.27666786.883.087.096.0
482020848019张雨桐金融学79.15000092.483.0100.096.0
#去除指定列中重复数据,设置keep参数
df1=df.drop_duplicates(["专业"],keep="last")
print("去重前:",df.shape)
print("去重后:",df1.shape)
df1
去重前: (57, 9)
去重后: (6, 9)
学号姓名性别专业英语数学Python选修管理学
82020844009刘玉国际贸易89.05091.480.32100.093.32
182020844020张家齐市场营销95.45091.096.00100.0109.00
332020848003张淳会计学91.30092.281.32100.094.32
422020848014贾晶晶NaN84.45093.082.66100.095.66
472020848018郭晓舒信息管理与信息系统82.50083.873.0090.086.00
562020848031任旭金融学83.42585.471.66100.084.66
#去除指定若干列中重复数据
df1=df.drop_duplicates(["学号","姓名"])
print("去重前:",df.shape)
print("去重后:",df1.shape)
df1
去重前: (57, 9)
去重后: (54, 9)
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
202020844022关帅NaN会计学90.00000092.675.00100.088.00
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00
222020844024刘浩天会计学85.10000083.285.00100.098.00
232020844025刘宇NaN75.20000085.676.00100.089.00
242020844026胡童会计学84.05000086.091.00100.0119.00
252020844027丁灿会计学88.75000086.266.00100.079.00
262020844028郑武田会计学89.55000087.491.00NaN104.00
272020844029金耀会计学79.45000087.268.00100.081.00
282020844030庞博会计学89.70000092.092.00100.0105.00
292020848001王春杨会计学88.10000089.884.00100.097.00
302020848002陈小恬会计学83.75000094.889.00100.0102.00
332020848003张淳会计学91.30000092.281.32100.094.32
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32
422020848014贾晶晶NaN84.45000093.082.66100.095.66
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00
452020848017赵迎辰NaN信息管理与信息系统82.25000087.474.00100.087.00
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
482020848019张雨桐金融学79.15000092.483.00100.096.00
492020848020孟德坤金融学83.45000087.480.66100.093.66
502020848021王少祖金融学82.95000091.678.0090.091.00
512020848023黄金雨金融学79.95000089.886.00100.099.00
522020848024汤佳怡金融学86.60000083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32
552020848029林可新金融学89.30000087.495.00100.0108.00
562020848031任旭金融学83.42500085.471.66100.084.66

4.3.3数据的抽取和合并

import pandas as pd
pd.set_option("display.unicode.east_asian_width",True)#解决数据输出时列名不对齐的问题
df=pd.read_excel("C:\\Users\\wsy\\Desktop\\cj.xlsx")

数据抽取

1、抽取列
df.学号
0     2020802045
1     2020844001
2     2020844002
3     2020844003
4     2020844004
5     2020844005
6     2020844007
7     2020844008
8     2020844009
9     2020844010
10    2020844011
11    2020844012
12    2020844013
13    2020844014
14    2020844015
15    2020844017
16    2020844018
17    2020844019
18    2020844020
19    2020844021
20    2020844022
21    2020844023
22    2020844024
23    2020844025
24    2020844026
25    2020844027
26    2020844028
27    2020844029
28    2020844030
29    2020848001
30    2020848002
31    2020848002
32    2020848002
33    2020848003
34    2020848004
35    2020848005
36    2020848006
37    2020848007
38    2020848008
39    2020848010
40    2020848011
41    2020848013
42    2020848014
43    2020848015
44    2020848016
45    2020848017
46    2020848018
47    2020848018
48    2020848019
49    2020848020
50    2020848021
51    2020848023
52    2020848024
53    2020848027
54    2020848028
55    2020848029
56    2020848031
Name: 学号, dtype: int64
df["学号"]
type(df["学号"])
pandas.core.series.Series
df[["学号"]]
type(df[["学号"]])
pandas.core.frame.DataFrame
df[["学号","姓名","专业"]]
学号姓名专业
02020802045魏天信息管理与信息系统
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
102020844011娄天楠市场营销
112020844012唐喆市场营销
122020844013史昀市场营销
132020844014刘欣语市场营销
142020844015王同市场营销
152020844017武天一市场营销
162020844018张析市场营销
172020844019陈雨涵市场营销
182020844020张家齐市场营销
192020844021李赫桐会计学
202020844022关帅会计学
212020844023刘嘉雯会计学
222020844024刘浩天会计学
232020844025刘宇NaN
242020844026胡童会计学
252020844027丁灿会计学
262020844028郑武田会计学
272020844029金耀会计学
282020844030庞博会计学
292020848001王春杨会计学
302020848002陈小恬会计学
312020848002陈小恬会计学
322020848002陈小恬会计学
332020848003张淳会计学
342020848004王佳琳信息管理与信息系统
352020848005郑彤信息管理与信息系统
362020848006张鹤同信息管理与信息系统
372020848007苏远信息管理与信息系统
382020848008方雨桃信息管理与信息系统
392020848010闫宇信息管理与信息系统
402020848011张田田信息管理与信息系统
412020848013曹一一信息管理与信息系统
422020848014贾晶晶NaN
432020848015贾淏文信息管理与信息系统
442020848016杨帆信息管理与信息系统
452020848017赵迎辰信息管理与信息系统
462020848018郭晓舒信息管理与信息系统
472020848018郭晓舒信息管理与信息系统
482020848019张雨桐金融学
492020848020孟德坤金融学
502020848021王少祖金融学
512020848023黄金雨金融学
522020848024汤佳怡金融学
532020848027热孜耶·买买提金融学
542020848028奴热艾力·雪艾力金融学
552020848029林可新金融学
562020848031任旭金融学
df.loc[:,["学号"]]
学号
02020802045
12020844001
22020844002
32020844003
42020844004
52020844005
62020844007
72020844008
82020844009
92020844010
102020844011
112020844012
122020844013
132020844014
142020844015
152020844017
162020844018
172020844019
182020844020
192020844021
202020844022
212020844023
222020844024
232020844025
242020844026
252020844027
262020844028
272020844029
282020844030
292020848001
302020848002
312020848002
322020848002
332020848003
342020848004
352020848005
362020848006
372020848007
382020848008
392020848010
402020848011
412020848013
422020848014
432020848015
442020848016
452020848017
462020848018
472020848018
482020848019
492020848020
502020848021
512020848023
522020848024
532020848027
542020848028
552020848029
562020848031
df.loc[:,["学号","姓名","专业"]]
学号姓名专业
02020802045魏天信息管理与信息系统
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
102020844011娄天楠市场营销
112020844012唐喆市场营销
122020844013史昀市场营销
132020844014刘欣语市场营销
142020844015王同市场营销
152020844017武天一市场营销
162020844018张析市场营销
172020844019陈雨涵市场营销
182020844020张家齐市场营销
192020844021李赫桐会计学
202020844022关帅会计学
212020844023刘嘉雯会计学
222020844024刘浩天会计学
232020844025刘宇NaN
242020844026胡童会计学
252020844027丁灿会计学
262020844028郑武田会计学
272020844029金耀会计学
282020844030庞博会计学
292020848001王春杨会计学
302020848002陈小恬会计学
312020848002陈小恬会计学
322020848002陈小恬会计学
332020848003张淳会计学
342020848004王佳琳信息管理与信息系统
352020848005郑彤信息管理与信息系统
362020848006张鹤同信息管理与信息系统
372020848007苏远信息管理与信息系统
382020848008方雨桃信息管理与信息系统
392020848010闫宇信息管理与信息系统
402020848011张田田信息管理与信息系统
412020848013曹一一信息管理与信息系统
422020848014贾晶晶NaN
432020848015贾淏文信息管理与信息系统
442020848016杨帆信息管理与信息系统
452020848017赵迎辰信息管理与信息系统
462020848018郭晓舒信息管理与信息系统
472020848018郭晓舒信息管理与信息系统
482020848019张雨桐金融学
492020848020孟德坤金融学
502020848021王少祖金融学
512020848023黄金雨金融学
522020848024汤佳怡金融学
532020848027热孜耶·买买提金融学
542020848028奴热艾力·雪艾力金融学
552020848029林可新金融学
562020848031任旭金融学
df.iloc[:,[0,1,3]]
学号姓名专业
02020802045魏天信息管理与信息系统
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
102020844011娄天楠市场营销
112020844012唐喆市场营销
122020844013史昀市场营销
132020844014刘欣语市场营销
142020844015王同市场营销
152020844017武天一市场营销
162020844018张析市场营销
172020844019陈雨涵市场营销
182020844020张家齐市场营销
192020844021李赫桐会计学
202020844022关帅会计学
212020844023刘嘉雯会计学
222020844024刘浩天会计学
232020844025刘宇NaN
242020844026胡童会计学
252020844027丁灿会计学
262020844028郑武田会计学
272020844029金耀会计学
282020844030庞博会计学
292020848001王春杨会计学
302020848002陈小恬会计学
312020848002陈小恬会计学
322020848002陈小恬会计学
332020848003张淳会计学
342020848004王佳琳信息管理与信息系统
352020848005郑彤信息管理与信息系统
362020848006张鹤同信息管理与信息系统
372020848007苏远信息管理与信息系统
382020848008方雨桃信息管理与信息系统
392020848010闫宇信息管理与信息系统
402020848011张田田信息管理与信息系统
412020848013曹一一信息管理与信息系统
422020848014贾晶晶NaN
432020848015贾淏文信息管理与信息系统
442020848016杨帆信息管理与信息系统
452020848017赵迎辰信息管理与信息系统
462020848018郭晓舒信息管理与信息系统
472020848018郭晓舒信息管理与信息系统
482020848019张雨桐金融学
492020848020孟德坤金融学
502020848021王少祖金融学
512020848023黄金雨金融学
522020848024汤佳怡金融学
532020848027热孜耶·买买提金融学
542020848028奴热艾力·雪艾力金融学
552020848029林可新金融学
562020848031任旭金融学
2、抽取行
df.loc[1:20,]
学号姓名性别专业英语数学Python选修管理学
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
202020844022关帅NaN会计学90.00000092.675.00100.088.00
df.iloc[1:20,]
学号姓名性别专业英语数学Python选修管理学
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
df.loc[[1,2,3,6,7],]
学号姓名性别专业英语数学Python选修管理学
12020844001郭夏国际贸易91.0583.486.00100.099.00
22020844002王晓加NaN54.2083.474.00NaN90.00
32020844003黄婷婷国际贸易87.8091.479.6695.092.66
62020844007王晨国际贸易62.4080.065.0090.078.00
72020844008韩天国际贸易96.2591.085.0097.098.00
df.iloc[[1,2,3,16,7],]
学号姓名性别专业英语数学Python选修管理学
12020844001郭夏国际贸易91.0583.486.00100.099.00
22020844002王晓加NaN54.2083.474.00NaN90.00
32020844003黄婷婷国际贸易87.8091.479.6695.092.66
62020844007王晨国际贸易62.4080.065.0090.078.00
72020844008韩天国际贸易96.2591.085.0097.098.00
df.loc[df.英语>90,]
学号姓名性别专业英语数学Python选修管理学
12020844001郭夏国际贸易91.0583.486.00100.099.00
72020844008韩天国际贸易96.2591.085.0097.098.00
172020844019陈雨涵市场营销95.2095.088.00100.0101.00
182020844020张家齐市场营销95.4591.096.00100.0109.00
332020848003张淳会计学91.3092.281.32100.094.32
372020848007苏远信息管理与信息系统90.2589.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.1086.283.00100.096.00
402020848011张田田信息管理与信息系统91.2089.696.3277.0109.32
442020848016杨帆信息管理与信息系统98.7087.695.00NaN108.00
532020848027热孜耶·买买提金融学92.7093.286.32100.099.32
4、抽取行列
df[["学号","姓名","专业"]][:10]
学号姓名专业
02020802045魏天信息管理与信息系统
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
df[["学号","姓名","专业"]][df.数学>90]
学号姓名专业
02020802045魏天信息管理与信息系统
32020844003黄婷婷国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
142020844015王同市场营销
162020844018张析市场营销
172020844019陈雨涵市场营销
182020844020张家齐市场营销
202020844022关帅会计学
282020844030庞博会计学
302020848002陈小恬会计学
312020848002陈小恬会计学
322020848002陈小恬会计学
332020848003张淳会计学
342020848004王佳琳信息管理与信息系统
422020848014贾晶晶NaN
482020848019张雨桐金融学
502020848021王少祖金融学
532020848027热孜耶·买买提金融学
df.loc[1:10,["学号","姓名","专业"]]
学号姓名专业
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
102020844011娄天楠市场营销
df.iloc[1:10,2:5]
性别专业英语
1国际贸易91.050
2NaN54.200
3国际贸易87.800
4NaN国际贸易61.150
5国际贸易65.125
6国际贸易62.400
7国际贸易96.250
8国际贸易89.050
9市场营销70.500

数据合并

df1=df[["学号","姓名","专业"]][:10]
df2=df[["学号","Python"]][:10]
df3=df[["数学","选修"]][:10]
df4=df.loc[20:25,["学号","姓名","专业"]]
df1
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.095.0106.0
12020844001郭夏国际贸易91.05000083.486.0100.099.0
22020844002王晓加NaN54.20000083.474.0NaN90.0
92020844010谢亚鹏市场营销70.50000085.260.090.073.0
192020844021李赫桐会计学88.27666786.883.087.096.0
482020848019张雨桐金融学79.15000092.483.0100.096.0
df2
学号Python
0202080204593.00
1202084400186.00
2202084400274.00
3202084400379.66
4202084400484.66
5202084400568.00
6202084400765.00
7202084400885.00
8202084400980.32
9202084401060.00
1、按列合并
df1.join(df2)#有同名列,无法区分,报错
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-237-92ca22d0224c> in <module>()
----> 1 df1.join(df2)#有同名列,无法区分,报错


D:\anacoda\anzhuang\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   6334         # For SparseDataFrame's benefit
   6335         return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
-> 6336                                  rsuffix=rsuffix, sort=sort)
   6337 
   6338     def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',


D:\anacoda\anzhuang\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   6349             return merge(self, other, left_on=on, how=how,
   6350                          left_index=on is None, right_index=True,
-> 6351                          suffixes=(lsuffix, rsuffix), sort=sort)
   6352         else:
   6353             if on is not None:


D:\anacoda\anzhuang\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     60                          copy=copy, indicator=indicator,
     61                          validate=validate)
---> 62     return op.get_result()
     63 
     64 


D:\anacoda\anzhuang\lib\site-packages\pandas\core\reshape\merge.py in get_result(self)
    572 
    573         llabels, rlabels = items_overlap_with_suffix(ldata.items, lsuf,
--> 574                                                      rdata.items, rsuf)
    575 
    576         lindexers = {1: left_indexer} if left_indexer is not None else {}


D:\anacoda\anzhuang\lib\site-packages\pandas\core\internals.py in items_overlap_with_suffix(left, lsuffix, right, rsuffix)
   5242         if not lsuffix and not rsuffix:
   5243             raise ValueError('columns overlap but no suffix specified: '
-> 5244                              '{rename}'.format(rename=to_rename))
   5245 
   5246         def lrenamer(x):


ValueError: columns overlap but no suffix specified: Index(['学号', 'Python'], dtype='object')
df1.join(df3)#默认以index为连接主键,可以不需要同名列
学号姓名专业数学选修
02020802045魏天信息管理与信息系统90.895.0
12020844001郭夏国际贸易83.4100.0
22020844002王晓加NaN83.4NaN
32020844003黄婷婷国际贸易91.495.0
42020844004赵小瑜国际贸易82.2100.0
52020844005辛禧国际贸易88.680.0
62020844007王晨国际贸易80.090.0
72020844008韩天国际贸易91.097.0
82020844009刘玉国际贸易91.4100.0
92020844010谢亚鹏市场营销85.290.0
df1.join(df2,lsuffix="x")#给同名列起别名
学号x姓名专业学号Python
02020802045魏天信息管理与信息系统202080204593.00
12020844001郭夏国际贸易202084400186.00
22020844002王晓加NaN202084400274.00
32020844003黄婷婷国际贸易202084400379.66
42020844004赵小瑜国际贸易202084400484.66
52020844005辛禧国际贸易202084400568.00
62020844007王晨国际贸易202084400765.00
72020844008韩天国际贸易202084400885.00
82020844009刘玉国际贸易202084400980.32
92020844010谢亚鹏市场营销202084401060.00
df1.merge(df3)#必须有同名列
---------------------------------------------------------------------------

MergeError                                Traceback (most recent call last)

<ipython-input-242-036768b080a3> in <module>()
----> 1 df1.merge(df3)#必须有同名列


D:\anacoda\anzhuang\lib\site-packages\pandas\core\frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   6387                      right_on=right_on, left_index=left_index,
   6388                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 6389                      copy=copy, indicator=indicator, validate=validate)
   6390 
   6391     def round(self, decimals=0, *args, **kwargs):


D:\anacoda\anzhuang\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     59                          right_index=right_index, sort=sort, suffixes=suffixes,
     60                          copy=copy, indicator=indicator,
---> 61                          validate=validate)
     62     return op.get_result()
     63 


D:\anacoda\anzhuang\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    544             warnings.warn(msg, UserWarning)
    545 
--> 546         self._validate_specification()
    547 
    548         # note this function has side effects


D:\anacoda\anzhuang\lib\site-packages\pandas\core\reshape\merge.py in _validate_specification(self)
   1033                         'left_index={lidx}, right_index={ridx}'
   1034                         .format(lon=self.left_on, ron=self.right_on,
-> 1035                                 lidx=self.left_index, ridx=self.right_index))
   1036                 if not common_cols.is_unique:
   1037                     raise MergeError("Data columns not unique: {common!r}"


MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
df1.merge(df2)#可以按照同名列进行连接,自动删除同名列
学号姓名专业Python
02020802045魏天信息管理与信息系统93.00
12020844001郭夏国际贸易86.00
22020844002王晓加NaN74.00
32020844003黄婷婷国际贸易79.66
42020844004赵小瑜国际贸易84.66
52020844005辛禧国际贸易68.00
62020844007王晨国际贸易65.00
72020844008韩天国际贸易85.00
82020844009刘玉国际贸易80.32
92020844010谢亚鹏市场营销60.00
ddf=pd.merge(df1,df2)
ddf
学号姓名专业Python
02020802045魏天信息管理与信息系统93.00
12020844001郭夏国际贸易86.00
22020844002王晓加NaN74.00
32020844003黄婷婷国际贸易79.66
42020844004赵小瑜国际贸易84.66
52020844005辛禧国际贸易68.00
62020844007王晨国际贸易65.00
72020844008韩天国际贸易85.00
82020844009刘玉国际贸易80.32
92020844010谢亚鹏市场营销60.00
ddf=pd.concat([df1,df2],axis=1)#按行拼接
ddf
学号姓名专业学号Python
02020802045魏天信息管理与信息系统202080204593.00
12020844001郭夏国际贸易202084400186.00
22020844002王晓加NaN202084400274.00
32020844003黄婷婷国际贸易202084400379.66
42020844004赵小瑜国际贸易202084400484.66
52020844005辛禧国际贸易202084400568.00
62020844007王晨国际贸易202084400765.00
72020844008韩天国际贸易202084400885.00
82020844009刘玉国际贸易202084400980.32
92020844010谢亚鹏市场营销202084401060.00
2、按行合并
df1.append(df4)#有相同列
学号姓名专业
02020802045魏天信息管理与信息系统
12020844001郭夏国际贸易
22020844002王晓加NaN
32020844003黄婷婷国际贸易
42020844004赵小瑜国际贸易
52020844005辛禧国际贸易
62020844007王晨国际贸易
72020844008韩天国际贸易
82020844009刘玉国际贸易
92020844010谢亚鹏市场营销
202020844022关帅会计学
212020844023刘嘉雯会计学
222020844024刘浩天会计学
232020844025刘宇NaN
242020844026胡童会计学
252020844027丁灿会计学
df1.append(df3) #列不相同,实现并集拼接
D:\anacoda\anzhuang\lib\site-packages\pandas\core\frame.py:6211: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  sort=sort)
专业姓名学号数学选修
0信息管理与信息系统魏天2.020802e+09NaNNaN
1国际贸易郭夏2.020844e+09NaNNaN
2NaN王晓加2.020844e+09NaNNaN
3国际贸易黄婷婷2.020844e+09NaNNaN
4国际贸易赵小瑜2.020844e+09NaNNaN
5国际贸易辛禧2.020844e+09NaNNaN
6国际贸易王晨2.020844e+09NaNNaN
7国际贸易韩天2.020844e+09NaNNaN
8国际贸易刘玉2.020844e+09NaNNaN
9市场营销谢亚鹏2.020844e+09NaNNaN
0NaNNaNNaN90.895.0
1NaNNaNNaN83.4100.0
2NaNNaNNaN83.4NaN
3NaNNaNNaN91.495.0
4NaNNaNNaN82.2100.0
5NaNNaNNaN88.680.0
6NaNNaNNaN80.090.0
7NaNNaNNaN91.097.0
8NaNNaNNaN91.4100.0
9NaNNaNNaN85.290.0
pd.concat([df1,df2,df3],axis=0,join="outer")#按行拼接   #inner 
D:\anacoda\anzhuang\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.
Python专业姓名学号数学选修
0NaN信息管理与信息系统魏天2.020802e+09NaNNaN
1NaN国际贸易郭夏2.020844e+09NaNNaN
2NaNNaN王晓加2.020844e+09NaNNaN
3NaN国际贸易黄婷婷2.020844e+09NaNNaN
4NaN国际贸易赵小瑜2.020844e+09NaNNaN
5NaN国际贸易辛禧2.020844e+09NaNNaN
6NaN国际贸易王晨2.020844e+09NaNNaN
7NaN国际贸易韩天2.020844e+09NaNNaN
8NaN国际贸易刘玉2.020844e+09NaNNaN
9NaN市场营销谢亚鹏2.020844e+09NaNNaN
093.00NaNNaN2.020802e+09NaNNaN
186.00NaNNaN2.020844e+09NaNNaN
274.00NaNNaN2.020844e+09NaNNaN
379.66NaNNaN2.020844e+09NaNNaN
484.66NaNNaN2.020844e+09NaNNaN
568.00NaNNaN2.020844e+09NaNNaN
665.00NaNNaN2.020844e+09NaNNaN
785.00NaNNaN2.020844e+09NaNNaN
880.32NaNNaN2.020844e+09NaNNaN
960.00NaNNaN2.020844e+09NaNNaN
0NaNNaNNaNNaN90.895.0
1NaNNaNNaNNaN83.4100.0
2NaNNaNNaNNaN83.4NaN
3NaNNaNNaNNaN91.495.0
4NaNNaNNaNNaN82.2100.0
5NaNNaNNaNNaN88.680.0
6NaNNaNNaNNaN80.090.0
7NaNNaNNaNNaN91.097.0
8NaNNaNNaNNaN91.4100.0
9NaNNaNNaNNaN85.290.0

4.3.4数据的增删改

import pandas as pd
pd.set_option("display.unicode.east_asian_width",True)#解决数据输出时列名不对齐的问题
df=pd.read_excel("C:\\Users\\wsy\\Desktop\\cj.xlsx")
df
学号姓名性别专业英语数学Python选修管理学
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00
12020844001郭夏国际贸易91.05000083.486.00100.099.00
22020844002王晓加NaN54.20000083.474.00NaN90.00
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66
52020844005辛禧国际贸易65.12500088.668.0080.081.00
62020844007王晨国际贸易62.40000080.065.0090.078.00
72020844008韩天国际贸易96.25000091.085.0097.098.00
82020844009刘玉国际贸易89.05000091.480.32100.093.32
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00
102020844011娄天楠市场营销58.80000084.660.00NaN73.00
112020844012唐喆市场营销80.23333387.464.00100.077.00
122020844013史昀市场营销82.73333382.273.32100.086.32
132020844014刘欣语市场营销48.71833383.886.0080.099.00
142020844015王同市场营销74.20000092.292.00100.0115.00
152020844017武天一市场营销73.21666783.279.0095.092.00
162020844018张析市场营销82.75000092.092.00100.0105.00
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00
182020844020张家齐市场营销95.45000091.096.00100.0109.00
192020844021李赫桐会计学88.27666786.883.0087.096.00
202020844022关帅NaN会计学90.00000092.675.00100.088.00
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00
222020844024刘浩天会计学85.10000083.285.00100.098.00
232020844025刘宇NaN75.20000085.676.00100.089.00
242020844026胡童会计学84.05000086.091.00100.0119.00
252020844027丁灿会计学88.75000086.266.00100.079.00
262020844028郑武田会计学89.55000087.491.00NaN104.00
272020844029金耀会计学79.45000087.268.00100.081.00
282020844030庞博会计学89.70000092.092.00100.0105.00
292020848001王春杨会计学88.10000089.884.00100.097.00
302020848002陈小恬会计学83.75000094.889.00100.0102.00
312020848002陈小恬会计学83.75000094.889.00100.0102.00
322020848002陈小恬会计学83.75000094.889.00100.0102.00
332020848003张淳会计学91.30000092.281.32100.094.32
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32
422020848014贾晶晶NaN84.45000093.082.66100.095.66
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00
452020848017赵迎辰NaN信息管理与信息系统82.25000087.474.00100.087.00
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
472020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00
482020848019张雨桐金融学79.15000092.483.00100.096.00
492020848020孟德坤金融学83.45000087.480.66100.093.66
502020848021王少祖金融学82.95000091.678.0090.091.00
512020848023黄金雨金融学79.95000089.886.00100.099.00
522020848024汤佳怡金融学86.60000083.488.32100.0101.32
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32
552020848029林可新金融学89.30000087.495.00100.0108.00
562020848031任旭金融学83.42500085.471.66100.084.66
数据增加
#在最后增加列
df["团员否"]=True
df
学号姓名性别专业英语数学Python选修管理学团员否
02020802045魏天信息管理与信息系统67.11666790.893.0095.0106.00True
12020844001郭夏国际贸易91.05000083.486.00100.099.00True
22020844002王晓加NaN54.20000083.474.00NaN90.00True
32020844003黄婷婷国际贸易87.80000091.479.6695.092.66True
42020844004赵小瑜NaN国际贸易61.15000082.284.66100.097.66True
52020844005辛禧国际贸易65.12500088.668.0080.081.00True
62020844007王晨国际贸易62.40000080.065.0090.078.00True
72020844008韩天国际贸易96.25000091.085.0097.098.00True
82020844009刘玉国际贸易89.05000091.480.32100.093.32True
92020844010谢亚鹏市场营销70.50000085.260.0090.073.00True
102020844011娄天楠市场营销58.80000084.660.00NaN73.00True
112020844012唐喆市场营销80.23333387.464.00100.077.00True
122020844013史昀市场营销82.73333382.273.32100.086.32True
132020844014刘欣语市场营销48.71833383.886.0080.099.00True
142020844015王同市场营销74.20000092.292.00100.0115.00True
152020844017武天一市场营销73.21666783.279.0095.092.00True
162020844018张析市场营销82.75000092.092.00100.0105.00True
172020844019陈雨涵市场营销95.20000095.088.00100.0101.00True
182020844020张家齐市场营销95.45000091.096.00100.0109.00True
192020844021李赫桐会计学88.27666786.883.0087.096.00True
202020844022关帅NaN会计学90.00000092.675.00100.088.00True
212020844023刘嘉雯会计学89.57500086.090.00100.0103.00True
222020844024刘浩天会计学85.10000083.285.00100.098.00True
232020844025刘宇NaN75.20000085.676.00100.089.00True
242020844026胡童会计学84.05000086.091.00100.0119.00True
252020844027丁灿会计学88.75000086.266.00100.079.00True
262020844028郑武田会计学89.55000087.491.00NaN104.00True
272020844029金耀会计学79.45000087.268.00100.081.00True
282020844030庞博会计学89.70000092.092.00100.0105.00True
292020848001王春杨会计学88.10000089.884.00100.097.00True
302020848002陈小恬会计学83.75000094.889.00100.0102.00True
312020848002陈小恬会计学83.75000094.889.00100.0102.00True
322020848002陈小恬会计学83.75000094.889.00100.0102.00True
332020848003张淳会计学91.30000092.281.32100.094.32True
342020848004王佳琳信息管理与信息系统75.62500091.093.00100.0106.00True
352020848005郑彤信息管理与信息系统88.90000090.078.00100.091.00True
362020848006张鹤同信息管理与信息系统89.75000088.882.50100.095.50True
372020848007苏远信息管理与信息系统90.25000089.279.3268.092.32True
382020848008方雨桃信息管理与信息系统93.10000086.283.00100.096.00True
392020848010闫宇信息管理与信息系统86.03333385.485.00100.098.00True
402020848011张田田信息管理与信息系统91.20000089.696.3277.0109.32True
412020848013曹一一信息管理与信息系统74.42666786.883.32100.096.32True
422020848014贾晶晶NaN84.45000093.082.66100.095.66True
432020848015贾淏文信息管理与信息系统46.67500080.887.00100.0100.00True
442020848016杨帆信息管理与信息系统98.70000087.695.00NaN108.00True
452020848017赵迎辰NaN信息管理与信息系统82.25000087.474.00100.087.00True
462020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00True
472020848018郭晓舒信息管理与信息系统82.50000083.873.0090.086.00True
482020848019张雨桐金融学79.15000092.483.00100.096.00True
492020848020孟德坤金融学83.45000087.480.66100.093.66True
502020848021王少祖金融学82.95000091.678.0090.091.00True
512020848023黄金雨金融学79.95000089.886.00100.099.00True
522020848024汤佳怡金融学86.60000083.488.32100.0101.32True
532020848027热孜耶·买买提金融学92.70000093.286.32100.099.32True
542020848028奴热艾力·雪艾力金融学15.00000075.063.32100.076.32True
552020848029林可新金融学89.30000087.495.00100.0108.00True
562020848031任旭金融学83.42500085.471.66100.084.66True
#指定位置增加列
df.insert(2,"年龄",18)
df
学号姓名年龄性别专业英语数学Python选修管理学团员否
02020802045魏天18信息管理与信息系统67.11666790.893.0095.0106.00True
12020844001郭夏18国际贸易91.05000083.486.00100.099.00True
22020844002王晓加18NaN54.20000083.474.00NaN90.00True
32020844003黄婷婷18国际贸易87.80000091.479.6695.092.66True
42020844004赵小瑜18NaN国际贸易61.15000082.284.66100.097.66True
52020844005辛禧18国际贸易65.12500088.668.0080.081.00True
62020844007王晨18国际贸易62.40000080.065.0090.078.00True
72020844008韩天18国际贸易96.25000091.085.0097.098.00True
82020844009刘玉18国际贸易89.05000091.480.32100.093.32True
92020844010谢亚鹏18市场营销70.50000085.260.0090.073.00True
102020844011娄天楠18市场营销58.80000084.660.00NaN73.00True
112020844012唐喆18市场营销80.23333387.464.00100.077.00True
122020844013史昀18市场营销82.73333382.273.32100.086.32True
132020844014刘欣语18市场营销48.71833383.886.0080.099.00True
142020844015王同18市场营销74.20000092.292.00100.0115.00True
152020844017武天一18市场营销73.21666783.279.0095.092.00True
162020844018张析18市场营销82.75000092.092.00100.0105.00True
172020844019陈雨涵18市场营销95.20000095.088.00100.0101.00True
182020844020张家齐18市场营销95.45000091.096.00100.0109.00True
192020844021李赫桐18会计学88.27666786.883.0087.096.00True
202020844022关帅18NaN会计学90.00000092.675.00100.088.00True
212020844023刘嘉雯18会计学89.57500086.090.00100.0103.00True
222020844024刘浩天18会计学85.10000083.285.00100.098.00True
232020844025刘宇18NaN75.20000085.676.00100.089.00True
242020844026胡童18会计学84.05000086.091.00100.0119.00True
252020844027丁灿18会计学88.75000086.266.00100.079.00True
262020844028郑武田18会计学89.55000087.491.00NaN104.00True
272020844029金耀18会计学79.45000087.268.00100.081.00True
282020844030庞博18会计学89.70000092.092.00100.0105.00True
292020848001王春杨18会计学88.10000089.884.00100.097.00True
302020848002陈小恬18会计学83.75000094.889.00100.0102.00True
312020848002陈小恬18会计学83.75000094.889.00100.0102.00True
322020848002陈小恬18会计学83.75000094.889.00100.0102.00True
332020848003张淳18会计学91.30000092.281.32100.094.32True
342020848004王佳琳18信息管理与信息系统75.62500091.093.00100.0106.00True
352020848005郑彤18信息管理与信息系统88.90000090.078.00100.091.00True
362020848006张鹤同18信息管理与信息系统89.75000088.882.50100.095.50True
372020848007苏远18信息管理与信息系统90.25000089.279.3268.092.32True
382020848008方雨桃18信息管理与信息系统93.10000086.283.00100.096.00True
392020848010闫宇18信息管理与信息系统86.03333385.485.00100.098.00True
402020848011张田田18信息管理与信息系统91.20000089.696.3277.0109.32True
412020848013曹一一18信息管理与信息系统74.42666786.883.32100.096.32True
422020848014贾晶晶18NaN84.45000093.082.66100.095.66True
432020848015贾淏文18信息管理与信息系统46.67500080.887.00100.0100.00True
442020848016杨帆18信息管理与信息系统98.70000087.695.00NaN108.00True
452020848017赵迎辰18NaN信息管理与信息系统82.25000087.474.00100.087.00True
462020848018郭晓舒18信息管理与信息系统82.50000083.873.0090.086.00True
472020848018郭晓舒18信息管理与信息系统82.50000083.873.0090.086.00True
482020848019张雨桐18金融学79.15000092.483.00100.096.00True
492020848020孟德坤18金融学83.45000087.480.66100.093.66True
502020848021王少祖18金融学82.95000091.678.0090.091.00True
512020848023黄金雨18金融学79.95000089.886.00100.099.00True
522020848024汤佳怡18金融学86.60000083.488.32100.0101.32True
532020848027热孜耶·买买提18金融学92.70000093.286.32100.099.32True
542020848028奴热艾力·雪艾力18金融学15.00000075.063.32100.076.32True
552020848029林可新18金融学89.30000087.495.00100.0108.00True
562020848031任旭18金融学83.42500085.471.66100.084.66True
#增加一行
df.loc[57]=["20200848045","王芳",10,"女","金融学",55,66,77,90,67,True]
df
学号姓名年龄性别专业英语数学Python选修管理学团员否
02020802045魏天18信息管理与信息系统67.11666790.893.0095.0106.00True
12020844001郭夏18国际贸易91.05000083.486.00100.099.00True
22020844002王晓加18NaN54.20000083.474.00NaN90.00True
32020844003黄婷婷18国际贸易87.80000091.479.6695.092.66True
42020844004赵小瑜18NaN国际贸易61.15000082.284.66100.097.66True
52020844005辛禧18国际贸易65.12500088.668.0080.081.00True
62020844007王晨18国际贸易62.40000080.065.0090.078.00True
72020844008韩天18国际贸易96.25000091.085.0097.098.00True
82020844009刘玉18国际贸易89.05000091.480.32100.093.32True
92020844010谢亚鹏18市场营销70.50000085.260.0090.073.00True
102020844011娄天楠18市场营销58.80000084.660.00NaN73.00True
112020844012唐喆18市场营销80.23333387.464.00100.077.00True
122020844013史昀18市场营销82.73333382.273.32100.086.32True
132020844014刘欣语18市场营销48.71833383.886.0080.099.00True
142020844015王同18市场营销74.20000092.292.00100.0115.00True
152020844017武天一18市场营销73.21666783.279.0095.092.00True
162020844018张析18市场营销82.75000092.092.00100.0105.00True
172020844019陈雨涵18市场营销95.20000095.088.00100.0101.00True
182020844020张家齐18市场营销95.45000091.096.00100.0109.00True
192020844021李赫桐18会计学88.27666786.883.0087.096.00True
202020844022关帅18NaN会计学90.00000092.675.00100.088.00True
212020844023刘嘉雯18会计学89.57500086.090.00100.0103.00True
222020844024刘浩天18会计学85.10000083.285.00100.098.00True
232020844025刘宇18NaN75.20000085.676.00100.089.00True
242020844026胡童18会计学84.05000086.091.00100.0119.00True
252020844027丁灿18会计学88.75000086.266.00100.079.00True
262020844028郑武田18会计学89.55000087.491.00NaN104.00True
272020844029金耀18会计学79.45000087.268.00100.081.00True
282020844030庞博18会计学89.70000092.092.00100.0105.00True
292020848001王春杨18会计学88.10000089.884.00100.097.00True
302020848002陈小恬18会计学83.75000094.889.00100.0102.00True
312020848002陈小恬18会计学83.75000094.889.00100.0102.00True
322020848002陈小恬18会计学83.75000094.889.00100.0102.00True
332020848003张淳18会计学91.30000092.281.32100.094.32True
342020848004王佳琳18信息管理与信息系统75.62500091.093.00100.0106.00True
352020848005郑彤18信息管理与信息系统88.90000090.078.00100.091.00True
362020848006张鹤同18信息管理与信息系统89.75000088.882.50100.095.50True
372020848007苏远18信息管理与信息系统90.25000089.279.3268.092.32True
382020848008方雨桃18信息管理与信息系统93.10000086.283.00100.096.00True
392020848010闫宇18信息管理与信息系统86.03333385.485.00100.098.00True
402020848011张田田18信息管理与信息系统91.20000089.696.3277.0109.32True
412020848013曹一一18信息管理与信息系统74.42666786.883.32100.096.32True
422020848014贾晶晶18NaN84.45000093.082.66100.095.66True
432020848015贾淏文18信息管理与信息系统46.67500080.887.00100.0100.00True
442020848016杨帆18信息管理与信息系统98.70000087.695.00NaN108.00True
452020848017赵迎辰18NaN信息管理与信息系统82.25000087.474.00100.087.00True
462020848018郭晓舒18信息管理与信息系统82.50000083.873.0090.086.00True
472020848018郭晓舒18信息管理与信息系统82.50000083.873.0090.086.00True
482020848019张雨桐18金融学79.15000092.483.00100.096.00True
492020848020孟德坤18金融学83.45000087.480.66100.093.66True
502020848021王少祖18金融学82.95000091.678.0090.091.00True
512020848023黄金雨18金融学79.95000089.886.00100.099.00True
522020848024汤佳怡18金融学86.60000083.488.32100.0101.32True
532020848027热孜耶·买买提18金融学92.70000093.286.32100.099.32True
542020848028奴热艾力·雪艾力18金融学15.00000075.063.32100.076.32True
552020848029林可新18金融学89.30000087.495.00100.0108.00True
562020848031任旭18金融学83.42500085.471.66100.084.66True
5720200848045王芳10金融学55.00000066.077.0090.067.00True
#增加多行
df1=df[["学号","姓名","专业"]][:10]
df.append(df1.iloc[:10,])
D:\anacoda\anzhuang\lib\site-packages\pandas\core\frame.py:6211: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  sort=sort)
Python专业团员否姓名学号年龄性别数学管理学英语选修
093.00信息管理与信息系统True魏天202080204518.090.8106.0067.11666795.0
186.00国际贸易True郭夏202084400118.083.499.0091.050000100.0
274.00NaNTrue王晓加202084400218.083.490.0054.200000NaN
379.66国际贸易True黄婷婷202084400318.091.492.6687.80000095.0
484.66国际贸易True赵小瑜202084400418.0NaN82.297.6661.150000100.0
568.00国际贸易True辛禧202084400518.088.681.0065.12500080.0
665.00国际贸易True王晨202084400718.080.078.0062.40000090.0
785.00国际贸易True韩天202084400818.091.098.0096.25000097.0
880.32国际贸易True刘玉202084400918.091.493.3289.050000100.0
960.00市场营销True谢亚鹏202084401018.085.273.0070.50000090.0
1060.00市场营销True娄天楠202084401118.084.673.0058.800000NaN
1164.00市场营销True唐喆202084401218.087.477.0080.233333100.0
1273.32市场营销True史昀202084401318.082.286.3282.733333100.0
1386.00市场营销True刘欣语202084401418.083.899.0048.71833380.0
1492.00市场营销True王同202084401518.092.2115.0074.200000100.0
1579.00市场营销True武天一202084401718.083.292.0073.21666795.0
1692.00市场营销True张析202084401818.092.0105.0082.750000100.0
1788.00市场营销True陈雨涵202084401918.095.0101.0095.200000100.0
1896.00市场营销True张家齐202084402018.091.0109.0095.450000100.0
1983.00会计学True李赫桐202084402118.086.896.0088.27666787.0
2075.00会计学True关帅202084402218.0NaN92.688.0090.000000100.0
2190.00会计学True刘嘉雯202084402318.086.0103.0089.575000100.0
2285.00会计学True刘浩天202084402418.083.298.0085.100000100.0
2376.00NaNTrue刘宇202084402518.085.689.0075.200000100.0
2491.00会计学True胡童202084402618.086.0119.0084.050000100.0
2566.00会计学True丁灿202084402718.086.279.0088.750000100.0
2691.00会计学True郑武田202084402818.087.4104.0089.550000NaN
2768.00会计学True金耀202084402918.087.281.0079.450000100.0
2892.00会计学True庞博202084403018.092.0105.0089.700000100.0
2984.00会计学True王春杨202084800118.089.897.0088.100000100.0
....................................
3883.00信息管理与信息系统True方雨桃202084800818.086.296.0093.100000100.0
3985.00信息管理与信息系统True闫宇202084801018.085.498.0086.033333100.0
4096.32信息管理与信息系统True张田田202084801118.089.6109.3291.20000077.0
4183.32信息管理与信息系统True曹一一202084801318.086.896.3274.426667100.0
4282.66NaNTrue贾晶晶202084801418.093.095.6684.450000100.0
4387.00信息管理与信息系统True贾淏文202084801518.080.8100.0046.675000100.0
4495.00信息管理与信息系统True杨帆202084801618.087.6108.0098.700000NaN
4574.00信息管理与信息系统True赵迎辰202084801718.0NaN87.487.0082.250000100.0
4673.00信息管理与信息系统True郭晓舒202084801818.083.886.0082.50000090.0
4773.00信息管理与信息系统True郭晓舒202084801818.083.886.0082.50000090.0
4883.00金融学True张雨桐202084801918.092.496.0079.150000100.0
4980.66金融学True孟德坤202084802018.087.493.6683.450000100.0
5078.00金融学True王少祖202084802118.091.691.0082.95000090.0
5186.00金融学True黄金雨202084802318.089.899.0079.950000100.0
5288.32金融学True汤佳怡202084802418.083.4101.3286.600000100.0
5386.32金融学True热孜耶·买买提202084802718.093.299.3292.700000100.0
5463.32金融学True奴热艾力·雪艾力202084802818.075.076.3215.000000100.0
5595.00金融学True林可新202084802918.087.4108.0089.300000100.0
5671.66金融学True任旭202084803118.085.484.6683.425000100.0
5777.00金融学True王芳2020084804510.066.067.0055.00000090.0
0NaN信息管理与信息系统NaN魏天2020802045NaNNaNNaNNaNNaNNaN
1NaN国际贸易NaN郭夏2020844001NaNNaNNaNNaNNaNNaN
2NaNNaNNaN王晓加2020844002NaNNaNNaNNaNNaNNaN
3NaN国际贸易NaN黄婷婷2020844003NaNNaNNaNNaNNaNNaN
4NaN国际贸易NaN赵小瑜2020844004NaNNaNNaNNaNNaNNaN
5NaN国际贸易NaN辛禧2020844005NaNNaNNaNNaNNaNNaN
6NaN国际贸易NaN王晨2020844007NaNNaNNaNNaNNaNNaN
7NaN国际贸易NaN韩天2020844008NaNNaNNaNNaNNaNNaN
8NaN国际贸易NaN刘玉2020844009NaNNaNNaNNaNNaNNaN
9NaN市场营销NaN谢亚鹏2020844010NaNNaNNaNNaNNaNNaN

68 rows × 11 columns

数据修改
#修改列
df["年龄"]=25
df
学号姓名年龄性别专业英语数学Python选修管理学团员否
02020802045魏天25信息管理与信息系统67.11666790.893.0095.0106.00True
12020844001郭夏25国际贸易91.05000083.486.00100.099.00True
22020844002王晓加25NaN54.20000083.474.00NaN90.00True
32020844003黄婷婷25国际贸易87.80000091.479.6695.092.66True
42020844004赵小瑜25NaN国际贸易61.15000082.284.66100.097.66True
52020844005辛禧25国际贸易65.12500088.668.0080.081.00True
62020844007王晨25国际贸易62.40000080.065.0090.078.00True
72020844008韩天25国际贸易96.25000091.085.0097.098.00True
82020844009刘玉25国际贸易89.05000091.480.32100.093.32True
92020844010谢亚鹏25市场营销70.50000085.260.0090.073.00True
102020844011娄天楠25市场营销58.80000084.660.00NaN73.00True
112020844012唐喆25市场营销80.23333387.464.00100.077.00True
122020844013史昀25市场营销82.73333382.273.32100.086.32True
132020844014刘欣语25市场营销48.71833383.886.0080.099.00True
142020844015王同25市场营销74.20000092.292.00100.0115.00True
152020844017武天一25市场营销73.21666783.279.0095.092.00True
162020844018张析25市场营销82.75000092.092.00100.0105.00True
172020844019陈雨涵25市场营销95.20000095.088.00100.0101.00True
182020844020张家齐25市场营销95.45000091.096.00100.0109.00True
192020844021李赫桐25会计学88.27666786.883.0087.096.00True
202020844022关帅25NaN会计学90.00000092.675.00100.088.00True
212020844023刘嘉雯25会计学89.57500086.090.00100.0103.00True
222020844024刘浩天25会计学85.10000083.285.00100.098.00True
232020844025刘宇25
  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

长街395

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值