鉴于最近看的东西太杂了,所以单独用一个文档记录一些平时看到的机器学习项目。
NNI的
https://nni.readthedocs.io/en/latest/FeatureEngineering/Overview.html
TreeBasedClassifier指的是ExtraTrees
SISSO,我也不知道什么东西,物理 材料 机器学习 Fortran?
https://arxiv.org/pdf/1710.03319.pdf
https://github.com/rouyang2017/SISSO
borutaPy可以通过树的深度计算出n_estimators参数
def _get_tree_num(self, n_feat):
depth = None
try:
depth = self.estimator.get_params()['max_depth']
except KeyError:
warnings.warn(
"The estimator does not have a max_depth property, as a result "
" the number of trees to use cannot be estimated automatically."
)
if depth == None:
depth = 10
# how many times a feature should be considered on average
f_repr = 100
# n_feat * 2 because the training matrix is extended with n shadow features
multi = ((n_feat * 2) / (np.sqrt(n_feat * 2) * depth))
n_estimators = int(multi * f_repr)
return n_estimators
实现KL散度
import numpy as np
def KL(a, b):
a = np.asarray(a, dtype=np.float)
b = np.asarray(b, dtype=np.float)
return np.sum(np.where(a != 0, a * np.log(a / b), 0))
values1 = [1.346112,1.337432,1.246655]
values2 = [1.033836,1.082015,1.117323]
print KL(values1, values2)