Binarizer类和binarize方法根据指定的阈值将特征二值化,小于等于阈值的,将特征值赋予0,大于特征值的赋予1,其阈值threshold默认都为0
①binarize方法:sklearn.preprocessing.binarize(X, threshold=0.0, copy=True)
a、对于非稀疏矩阵而言,阈值threshold可以设置任何浮点数
In [1]: from sklearn import preprocessing
...: from sklearn import datasets
...: import numpy as np
...: data = datasets.load_boston()
...: new_target = preprocessing.binarize(data.target[:,np.newaxis] , thresh
...: old = data.target.mean()).astype(int)#小于等于均值赋予0,否则赋予1
...: print(type(preprocessing.binarize(data.target[:,np.newaxis] , threshold
...: = data.target.mean())))
...: new_target[:5]
...:
<class 'numpy.ndarray'>
Out[1]:
array([[1],
[0],
[1],
[1],
[1]])
In [2]: preprocessing.binarize(data.target[:,np.newaxis] , threshold = -1).asty
...: pe(int)[:5]
Out[2]:
array([[1],
[1],
[1],
[1],
[1]])
b、对于稀疏矩阵而言,阈值threshold必须设置为大于等于0浮点数
In [3]: from scipy.sparse import coo
...: from sklearn import preprocessing
...: spar = coo.coo_matrix(np.random.binomial(1,0.25,100))
...: preprocessing.binarize(spar,threshold=-1)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-ff778f656a6b> in <module>()
2 from sklearn import preprocessing
3 spar = coo.coo_matrix(np.random.binomial(1,0.25,100))
----> 4 preprocessing.binarize(spar,threshold=-1)
d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in binarize(X
, threshold, copy)
1470 if sparse.issparse(X):
1471 if threshold < 0:
-> 1472 raise ValueError('Cannot binarize a sparse matrix with thres
hold '
1473 '< 0')
1474 cond = X.data > threshold
ValueError: Cannot binarize a sparse matrix with threshold < 0
In [4]: preprocessing.binarize(spar,threshold=0)
Out[4]:
<1x100 sparse matrix of type '<class 'numpy.int32'>'
with 24 stored elements in Compressed Sparse Row format>
②Binarizer类:sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)
a、对于非稀疏矩阵而言,阈值threshold可以设置任意浮点数
In [5]: from sklearn import preprocessing
...: from sklearn import datasets
...: import numpy as np
...: data = datasets.load_boston()
...: bz = preprocessing.Binarizer(data.target.mean())
...: new_target = bz.fit_transform(data.target[:,np.newaxis]).astype(int)
...: print(bz)
...: new_target[:5]
...:
Binarizer(copy=True, threshold=22.532806324110677)
Out[5]:
array([[1],
[0],
[1],
[1],
[1]])
In [6]: preprocessing.Binarizer(-1).fit_transform(data.target[:,np.newaxis]).as
...: type(int)[:5]
Out[6]:
array([[1],
[1],
[1],
[1],
[1]])
b、对于稀疏矩阵而言,阈值threshold同样必须设置为大于等于0浮点数
In [7]: from scipy.sparse import coo
...: spar = coo.coo_matrix(np.random.binomial(1,0.25,100))
...: preprocessing.Binarizer(threshold= -1).fit_transform(spar)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-fc5a78d3b8c5> in <module>()
1 from scipy.sparse import coo
2 spar = coo.coo_matrix(np.random.binomial(1,0.25,100))
----> 3 preprocessing.Binarizer(threshold= -1).fit_transform(spar)
d:\softwore\python\lib\site-packages\sklearn\base.py in fit_transform(self, X, y
, **fit_params)
492 if y is None:
493 # fit method of arity 1 (unsupervised transformation)
--> 494 return self.fit(X, **fit_params).transform(X)
495 else:
496 # fit method of arity 2 (supervised transformation)
d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in transform(
self, X, y, copy)
1549 """
1550 copy = copy if copy is not None else self.copy
-> 1551 return binarize(X, threshold=self.threshold, copy=copy)
1552
1553
d:\softwore\python\lib\site-packages\sklearn\preprocessing\data.py in binarize(X
, threshold, copy)
1470 if sparse.issparse(X):
1471 if threshold < 0:
-> 1472 raise ValueError('Cannot binarize a sparse matrix with thres
hold '
1473 '< 0')
1474 cond = X.data > threshold
ValueError: Cannot binarize a sparse matrix with threshold < 0