需求:
将工号列(Job Number)设置为 DataFrame 的索引。
将性别列(Gender)的所有 "male" 转换成 "M","female" 转换成 "F"。
新增一列 "Service Year",表示每个员工的入职年数(入职年数计算至 2023 年)。
选取所有年龄大于等于 35 岁的女性员工的姓名、工号、所在部门、性别和年龄这几列。
将所有的女性员工的入职时间根据正则表达式提取出来,格式为 "月/日/年"。
现有DataFrame:
import pandas as pd
data = {"Name": ["Alice", "Bob", "Charlie", "David", "Emma", "Frank", "Grace"],
"Job Number": ["E1001", "E1002", "E1003", "E1004", "E1005", "E1006", "E1007"],
"Department": ["A", "A", "B", "B", "C", "C", "C"],
"Gender": ["female", "male", "male", "male", "female", "male", "female"],
"Age": [26, 30, 35, 40, 28, 45, 33],
"Join Date": ["3/1/2020", "5/15/2018", "2/17/2015", "6/2/2010", "12/12/2019", "3/20/2013", "9/1/2017"]}
df = pd.DataFrame(data)
数据结果:
Name Job Number Department Gender Age Join Date
0 Alice E1001 A female 26 2020-03-01
1 Bob E1002 A male 30 2018-05-15
2 Charlie E1003 B male 35 2015-02-17
3 David E1004 B male 40 2010-06-02
4 Emma E1005 C female 28 2019-12-12
5 Frank E1006 C male 45 2013-03-20
6 Grace E1007 C female 33 2017-09-01
1. 将工号列(Job Number)设置为 DataFrame 的索引。
df = df.set_index('Job Number')
2. 将性别列(Gender)的所有 "male" 转换成 "M","female" 转换成 "F"。
df['Gender'] = df['Gender'].replace('female','F')
df['Gender'] = df['Gender'].replace('male','M')
3. 新增一列 "Service Year",表示每个员工的入职年数(入职年数计算至 2023 年)
df['Join Date'] = pd.to_datetime(df['Join Date'])
df['Service Year'] = 2023 - df['Join Date'].dt.year
4. 选取所有年龄大于等于 35 岁的女性员工的姓名、工号、所在部门、性别和年龄这几列
dataf = df.loc[(df['Age']>=35) & (df['Gender'] == 'M')]
dataf.iloc[:,:4]
5. 将所有的女性员工的入职时间提取出来,格式为 "月/日/年"。
df.loc[df['Gender'] == 'F', 'Join Date'] = df[df['Gender'] == 'F']['Join Date'].dt.strftime('%m/%d/%Y')