I have a Pandas dataframe that contains a large number of variables. This can be simplified as:
tempDF = pd.DataFrame({ 'var1': [12,12,12,12,45,45,45,51,51,51],
'var2': ['a','a','b','b','b','b','b','c','c','d'],
'var3': ['e','f','f','f','f','g','g','g','g','g'],
'var4': [1,2,3,3,4,5,6,6,6,7]})
If I wanted to select a subset of the dataframe (e.g. var2='b' and var4=3), I would use:
tempDF.loc[(tempDF['var2']=='b') & (tempDF['var4']==3),:]
However, is it possible to select a subset of the dataframe if the matching criteria are stored within a dict, such as:
tempDict = {'var2': 'b','var4': 3}
It's important that the variable names are not predefined and the number of variables included in the dict is changeable.
I've been puzzling over this for a while and so any suggestions would be greatly appreciated.
解决方案
You could create mask for each condition using list comprehension and then join them by converting to dataframe and using all:
In [23]: pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
Out[23]:
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 False
9 False
dtype: bool
Then you could slice your dataframe with that mask:
mask = pd.DataFrame([tempDF[key] == val for key, val in tempDict.items()]).T.all(axis=1)
In [25]: tempDF[mask]
Out[25]:
var1 var2 var3 var4
2 12 b f 3
3 12 b f 3