转自:https://blog.csdn.net/sinat_29957455/article/details/79007668
1、Series唯一值判断
- s = Series([3,3,1,2,4,3,4,6,5,6])
- #判断Series中的值是否重复,False表示重复
- print(s.is_unique)
- #False
- #输出Series中不重复的值,返回值没有排序,返回值的类型为数组
- print(s.unique())
- #[3 1 2 4 6 5]
- print(type(s.unique()))
- #<class 'numpy.ndarray'>
- #统计Series中重复值出现的次数,默认是按出现次数降序排序
- print(s.value_counts())
- '''''
- 3 3
- 6 2
- 4 2
- 5 1
- 2 1
- 1 1
- '''
- #按照重复值的大小排序输出频率
- print(s.value_counts(sort=False))
- '''''
- 1 1
- 2 1
- 3 3
- 4 2
- 5 1
- 6 2
- '''
2、成员资格判断
a、Series的成员资格
- s = Series([5,5,6,1,1])
- print(s)
- '''''
- 0 5
- 1 5
- 2 6
- 3 1
- 4 1
- '''
- #判断矢量化集合的成员资格,返回一个bool类型的Series
- print(s.isin([5]))
- '''''
- 0 True
- 1 True
- 2 False
- 3 False
- 4 False
- '''
- print(type(s.isin([5])))
- #<class 'pandas.core.series.Series'>
- #通过成员资格方法选取Series中的数据子集
- print(s[s.isin([5])])
- '''''
- 0 5
- 1 5
- '''
b、DataFrame的成员资格
- a = [[3,2,6],[2,1,4],[6,2,5]]
- data = DataFrame(a,index=["a","b","c"],columns=["one","two","three"])
- print(data)
- '''''
- one two three
- a 3 2 6
- b 2 1 4
- c 6 2 5
- '''
- #返回一个bool的DataFrame
- print(data.isin([1]))
- '''''
- one two three
- a False False False
- b False True False
- c False False False
- '''
- #选取DataFrame中值为1的数,其他的为NaN
- print(data[data.isin([1])])
- '''''
- one two three
- a NaN NaN NaN
- b NaN 1.0 NaN
- c NaN NaN NaN
- '''
- #将NaN用0进行填充
- print(data[data.isin([1])].fillna(0))
- '''''
- one two three
- a 0.0 0.0 0.0
- b 0.0 1.0 0.0
- c 0.0 0.0 0.0
- '''