I needed to search multiple cells for a specific value and when it is found it should be returned in new column.
I got an answer here; Python: find string in multiple columns and return it in new column yet this line below return the first value found
df['b'] = (df[cols].where(df[cols].stack().str.contains('b')
.unstack(fill_value=False)).ffill(1).iloc[:,-1])
where cols
df = df[['col1', 'col2', 'col3', 'col4']]
I tried the other answers and they all gave me error ValueError: cannot reindex from a duplicate axis
Do any one have an idea how can I get all the matching values in one cell.
the data set
ID col0 col1 col2 col3 col4 col5
1 jack a/h t/m w/n y/h 56
2 sam z/n b/w null b/n 93
3 john b/i y/d b/d null 33
The code I'm using now :
df['b'] = (df[cols].where(df[cols].stack().str.contains('b')
.unstack(fill_value=False)).ffill(1).iloc[:,-1])
And here the output I am getting now
ID col0 col1 col2 col3 col4 col5 b
1 jack a/h t/m w/n y/h 56 -
2 sam z/n b/w null b/n 93 b/w
3 john b/i y/d b/d null 33 b/i
And actually I want the output to look like the data-frame below
ID col0 col1 col2 col3 col4 col5 b
1 jack a/h t/m w/n y/h 56 null
2 sam z/n b/w null b/n 93 b/w - b/n
3 john b/i y/d b/d null 33 b/i - b/d
解决方案
Use DataFrame.filter to filter dataframe containing columns col1-col4 and use DataFrame.stack, then using Series.str.contains filter the stacked dataframe finally use Series.groupby on level=0 and aggregate using join:
s = df.filter(regex=r'col[1-4]').stack()
s = s[s.str.contains('b')].groupby(level=0).agg(' - '.join)
df['b'] = s
Result:
# print(df)
ID col0 col1 col2 col3 col4 col5 b
0 1 jack a/h t/m w/n y/h 56 NaN
1 2 sam z/n b/w NaN b/n 93 b/w - b/n
2 3 john b/i y/d b/d NaN 33 b/i - b/d