pandas 拆分 列 的一个特别例子

class_name列同时包含课程名称和同类群组编号。
我想将栏分为两栏(名称,同类群组编号)

从:

 

| class_name |

| introduction to programming 1th |
| introduction to programming 2th |
| introduction to programming 3th |
| introduction to programming 4th |
| algorithms and data structure 1th |
| algorithms and data structure 2th |
| object-oriented programming |
| database systems |



(我知道它应该像1st,2nd,3rd一样,但是字符串使用的是我的语言,并且在数字之后我们反复使用相同的字符)。

至:
 

| class_name | class_cohort |

| introduction to programming | 1 |
| introduction to programming | 2 |
| introduction to programming | 3 |
| introduction to programming | 4 |
| algorithms and data structure | 1 |
| alrogithms and data structure | 2 |
| object-oriented programming | 1 |
| database systems | 1 |



这是我一直在努力的代码:
 

import pandas as pd

course_count = 100
df = pd.read_csv("course.csv", nrows=course_count)

cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled']

df = df[cols_interest]
df.insert(1, 'class_cohort', 0)

# this is how I extract the numbers
df['class_name'].str.extract('(\d)').head()

# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's.

# once I figure that out, I plan to discard the last digits
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1])



我简要地研究了一种解决方案,将逗号放在第1,第2,第3之前,然后使用逗号作为分隔符来拆分列,但我想不出一种方法来替换所有数字\ s1th->,1th。

 

 

最佳答案

您可以indexing by positions
 

df['class_cohort'] = df['class_name'].str[-3:-2]
df['class_name'] = df['class_name'].str[:-4]
print df
   class_name class_cohort
0       cs101            1
1       cs101            2
2       cs101            3
3       cs101            4
4  algorithms            1
5  algorithms            2


我的博客
或使用str.extract

df['class_cohort'] = df['class_name'].str.extract('(\d)')
df['class_name'] = df['class_name'].str[:-4]
print df
                      class_name class_cohort
0    introduction to programming            1
1    introduction to programming            2
2    introduction to programming            3
3    introduction to programming            4
4  algorithms and data structure            1
5  algorithms and data structure            2
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值