《用Python中的自定义损失函数和正则化来拟合线性模型》
假设有100个样本点,每个样本点的feature是10维(9个基础变量和1个截距),为了更好地展现实验效果,我们为样本添加噪声:
# Generate predictors
X_raw = np.random.random(100*9)
X_raw = np.reshape(X_raw, (100, 9))
# Standardize the predictors
scaler = StandardScaler().fit(X_raw)
X = scaler.transform(X_raw)
# Add an intercept column to the model.
X = np.abs(np.concatenate((np.ones((X.shape[0],1)), X), axis=1))
# Define my "true" beta coefficients
beta = np.array([2,6,7,3,5,7,1,2,2,8])
# Y = Xb
Y_true = np.matmul(X,beta)
# Observed data with noise
Y = Y_true*np.exp(np.random.normal(loc=0.0, scale=0.2, size=100))
其中有2种选择:
Mean Absolute Percentage Error (MAPE)
Weighted MAPE
损失函数为:
def mean_absolute_percentage_error(y_true, y_pred, sample_weights=None):
y_true = np.array(y_true)
y_pred = np.array(y_pred)
assert len(y_true) == len(y_pred)
if np.any(y_true==0):
print("Found zeroes in y_true. MAPE undefined. Removing from set...")
idx = np.where(y_true==0)
y_true = np.delete(y_true, idx)
y_pred = np.delete(y_pred, idx)
if type(sample_weights) != type(None):
sample_weights = np.array(sample_weights)
sample_weights = np.delete(sample_weights, idx)
if type(sample_weights) == type(None):
return(np.mean(np.abs((y_true - y_pred) / y_true)) * 100)
else:
sample_weights = np.array(sample_weights)
assert len(sample_weights) == len(y_true)
return(100/sum(sample_weights)*np.dot(
sample_weights, (np.abs((y_true - y_pred) / y_true))
))
传统求解方式
本方法的求解方式:
相应的代码为:
from scipy.optimize import minimize
loss_function = mean_absolute_percentage_error
def objective_function(beta, X, Y):
error = loss_function(np.matmul(X,beta), Y)
return(error)
# You must provide a starting point at which to initialize
# the parameter search space
beta_init = np.array([1]*X.shape[1])
result = minimize(objective_function, beta_init, args=(X,Y),
method='BFGS', options={'maxiter': 500})
# The optimal values for the input parameters are stored
# in result.x
beta_hat = result.x
print(beta_hat)