excel split函数_Python处理Excel的常用操作(二)

继续Python处理Excel的常用操作(一)接着分享:

8、多表联合

在excel中,有两个sheet,我们需要将这两个表进行合并处理,简单来说是直接将其中一个复制粘贴到另一个表中,但一般会用到一些函数语句,才能达到最终目的,在Python中,可以这样处理。

首先,先看一下需求,现在有两个数据集,如下图所示,需要将Score表单添加到student表单中,同时需要按照ID列一一对应。

02cd0b7418d69fd64683d900965b0874.png

3dc5a84d5ba29fc291888b81475d81c4.png

因此,需要先读取两个数据集,利用merge()函数进行相关操作,具体可见pandas文档,本次案例简单来说,便是按照student表的ID列左边连接score表,连接后发现有空值,以0填充,最后将文本类型改为整形。

students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Students')
scores = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Scores')
table = students.merge(scores, how='left', on='ID').fillna(0)
table.Score = table.Score.astype(int)
print(students,scores,table)
# 最终输出结果如下:
     ID         Name  Score
0    1  Student_001     81
1    3  Student_003     83
2    5  Student_005     85
3    7  Student_007     87
4    9  Student_009     89
5   11  Student_011     91
6   13  Student_013     93
7   15  Student_015     95
8   17  Student_017     97
9   19  Student_019     99
10  21  Student_021      0
11  23  Student_023      0
12  25  Student_025      0
13  27  Student_027      0
14  29  Student_029      0
15  31  Student_031      0
16  33  Student_033      0
17  35  Student_035      0
18  37  Student_037      0
19  39  Student_039      0

第二种方法:本文上一种是按照系统自带的index进行操作,现在我将ID列作为DateFrame的index,因此需要变化一下merge()函数的参数。

students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Students', index_col='ID')
scores = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Scores', index_col='ID')
table = students.merge(scores, how='left', left_on=students.index, right_on=scores.index).fillna(0)
table.Score = table.Score.astype(int)
print(table)

 key_0         Name  Score
0       1  Student_001     81
1       3  Student_003     83
2       5  Student_005     85
3       7  Student_007     87
4       9  Student_009     89
5      11  Student_011     91
6      13  Student_013     93
7      15  Student_015     95
8      17  Student_017     97
9      19  Student_019     99
10     21  Student_021      0
11     23  Student_023      0
12     25  Student_025      0
13     27  Student_027      0
14     29  Student_029      0
15     31  Student_031      0
16     33  Student_033      0
17     35  Student_035      0
18     37  Student_037      0
19     39  Student_039      0

第三种方法:利用join()函数,具体可见文档,至于一些表连接的函数对比分析可见下面这篇文章。

yeayee:Pandas Dataframe操作concat、join、merge​zhuanlan.zhihu.com
5af86ff9bf4075a7bb4bc60cb7713af8.png
students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Students', index_col='ID')
scores = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel16Student_Score.xlsx', sheet_name='Scores', index_col='ID')
table = students.join(scores, how='left').fillna(0)
table.Score = table.Score.astype(int)
print(table)

9、数据校验

在excel中,有时会设置单元格数值的范围,比如说设定分数列只能填0-100,填其他数字便报错

55a6be1544dde05547604fdf77e24e87.png

在python中,可以先定义一个函数,然后将此函数通过apply()函数进行数据验证

def score_valication(row):
    if not 0 <= row.Score <= 100:
        print(f'#{row.ID}tstudent {row.Name} has an invalid score {row.Score}')
students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel17Students.xlsx')
# print(students)
students.apply(score_valication, axis=1)

#1	student Student_001 has an invalid score -40
#2	student Student_002 has an invalid score -30
#3	student Student_003 has an invalid score -20
#4	student Student_004 has an invalid score -10
#16	student Student_016 has an invalid score 110
#17	student Student_017 has an invalid score 120
#18	student Student_018 has an invalid score 130
#19	student Student_019 has an invalid score 140
#20	student Student_020 has an invalid score 150

10、数据分列

在excel中有很多数据分列的方法,今天介绍一个简单方法:ctrl+E

66946004d43b9603899bbc47c3a56f27.png

eb7a1e910854fcbab41941c1326626c9.png

在python中也可以进行相关操作,利用split()函数。

employees = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel18Employees.xlsx', index_col='ID')
df = employees['Full Name'].str.split(expand=True)
employees['First Name'] = df[0]
employees['Last Name'] = df[1]
print(employees)
          Full Name First Name    Last Name
ID                                          
1         Syed Abbas       Syed        Abbas
2     Catherine Abel  Catherine         Abel
3    Kim Abercrombie        Kim  Abercrombie
4    Kim Abercrombie        Kim  Abercrombie
5    Kim Abercrombie        Kim  Abercrombie
6     Hazem Abolrous      Hazem     Abolrous
7       Sam Abolrous        Sam     Abolrous
8   Humberto Acevedo   Humberto      Acevedo
9     Gustavo Achong    Gustavo       Achong
10    Pilar Ackerman      Pilar     Ackerman
11    Pilar Ackerman      Pilar     Ackerman
12       Aaron Adams      Aaron        Adams
13        Adam Adams       Adam        Adams
14        Alex Adams       Alex        Adams
15   Alexandra Adams  Alexandra        Adams
16     Allison Adams    Allison        Adams
17      Amanda Adams     Amanda        Adams
18       Amber Adams      Amber        Adams
19      Andrea Adams     Andrea        Adams
20       Angel Adams      Angel        Adams

11、统计求和求平均

在excel中,计算平均数和总和是很常见的,大都是利用average()和sum()函数

0f076415e4e0670ed195bc660b3447ce.png

在python中,sum()代表求和,mean()代表平均数

students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel19Students.xlsx', index_col='ID')
students['Total'] = students[['Test_1', 'Test_2', 'Test_3']].sum(axis=1)
students['Average'] = students[['Test_1', 'Test_2', 'Test_3']].mean(axis=1)

col_mean = students[['Test_1', 'Test_2', 'Test_3', 'Total', 'Average']].mean()
col_mean['Name'] = 'Summary'
students = students.append(col_mean, ignore_index=True)
print(students)

           Name  Test_1  Test_2  Test_3  Total    Average
0   Student_001   62.00   86.00    83.0  231.0  77.000000
1   Student_002   77.00   97.00    78.0  252.0  84.000000
2   Student_003   57.00   96.00    46.0  199.0  66.333333
3   Student_004   57.00   87.00    80.0  224.0  74.666667
4   Student_005   95.00   59.00    87.0  241.0  80.333333
5   Student_006   56.00   97.00    61.0  214.0  71.333333
6   Student_007   64.00   91.00    67.0  222.0  74.000000
7   Student_008   96.00   70.00    48.0  214.0  71.333333
8   Student_009   77.00   73.00    48.0  198.0  66.000000
9   Student_010   90.00   94.00    67.0  251.0  83.666667
10  Student_011   62.00   55.00    63.0  180.0  60.000000
11  Student_012   83.00   76.00    81.0  240.0  80.000000
12  Student_013   68.00   60.00    90.0  218.0  72.666667
13  Student_014   82.00   68.00    98.0  248.0  82.666667
14  Student_015   61.00   67.00    91.0  219.0  73.000000
15  Student_016   59.00   63.00    46.0  168.0  56.000000
16  Student_017   62.00   83.00    93.0  238.0  79.333333
17  Student_018   90.00   75.00    80.0  245.0  81.666667
18  Student_019  100.00   95.00    55.0  250.0  83.333333
19  Student_020   61.00   87.00   100.0  248.0  82.666667
20      Summary   72.95   78.95    73.1  225.0  75.000000

12、消除重复值

在excel中,消除重复值可以利用条件格式

b3442a232f929523634a1de925f462bf.png

在python中,可以利用drop_duplicates()函数

students = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel020Students_Duplicates.xlsx')
students.drop_duplicates(subset='Name', inplace=True, keep='last')
print(students)

     ID         Name  Test_1  Test_2  Test_3
5    6  Student_006      56      97      61
6    7  Student_007      64      91      67
7    8  Student_008      96      70      48
8    9  Student_009      77      73      48
9   10  Student_010      90      94      67
10  11  Student_011      62      55      63
11  12  Student_012      83      76      81
12  13  Student_013      68      60      90
13  14  Student_014      82      68      98
14  15  Student_015      61      67      91
15  16  Student_016      59      63      46
16  17  Student_017      62      83      93
17  18  Student_018      90      75      80
18  19  Student_019     100      95      55
19  20  Student_020      61      87     100
20  21  Student_001      62      86      83
21  22  Student_002      77      97      78
22  23  Student_003      57      96      46
23  24  Student_004      57      87      80
24  25  Student_005      95      59      87

13、行列转换

在excel中,进行行列转换,可以有一些操作,但是我还是不太会,就直接上python代码:

984f0e230695399c013ad4707bc16f7d.png
pd.options.display.max_columns = 999
videos = pd.read_excel(r'C:UsersAdministratorDesktoppythonprojectexcel021Videos.xlsx', index_col='Month')
# table = videos.transpose()
table = videos.T
print(table)
Month         Jan   Feb   Mar    Apr    May    Jun    Jul    Aug    Sep  
Active User  1000  1500  2250   3375   5063   7595  11393  17090  25635   
Page View    3500  5250  7875  11813  17721  26583  39876  59815  89723   

Month           Oct     Nov     Dec  
Active User   38453   57680   86520  
Page View    134586  201880  302820 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值