深度学习：损失函数与激活函数全解析

闲人编程

已于 2025-05-23 23:16:27 修改

阅读量1.3k

点赞数 28

分类专栏： python 文章标签：深度学习人工智能激活函数损失函数 relu python 分类回归

于 2025-05-23 07:30:00 首次发布

本文链接：https://blog.csdn.net/qq_42568323/article/details/148129942

版权

python 专栏收录该内容

141 篇文章

订阅专栏

深度学习中常见的损失函数和激活函数详解

引言

在深度学习中，损失函数和激活函数是模型训练过程中两个最核心的组件。损失函数衡量模型预测与真实值之间的差异，为优化算法提供方向；而激活函数为神经网络引入非线性能力，使网络能够学习复杂模式。本文将全面解析深度学习中常见的损失函数和激活函数，包括数学原理、特性分析、适用场景以及Python实现，并通过实验对比不同组合的效果。

一、损失函数详解

1.1 损失函数的作用与分类

损失函数（Loss Function）是用于衡量模型预测输出与真实值之间差异的函数，其数学表示为：
$\mathcal{L}(\theta) = \frac{1}{N}\sum_{i=1}^N \ell(y_i, f(x_i; \theta))$

根据任务类型，损失函数主要分为三类：

1.2 回归任务损失函数

1.2.1 均方误差（MSE）

数学表达式：
$\text{MSE} = \frac{1}{N}\sum_{i=1}^N (y_i - \hat{y}_i)^2$

特性分析：

对异常值敏感
可导且处处平滑
输出值域：[0, +∞)

Python实现：

def mean_squared_error(y_true, y_pred):
    """
    计算均方误差(MSE)
    参数:
        y_true: 真实值数组，形状(n_samples,)
        y_pred: 预测值数组，形状(n_samples,)
    返回:
        mse值
    """
    return np.mean(np.square(y_true - y_pred))

1.2.2 平均绝对误差（MAE）

数学表达式：
$\text{MAE} = \frac{1}{N}\sum_{i=1}^N |y_i - \hat{y}_i|$

特性分析：

对异常值鲁棒
在0点不可导
输出值域：[0, +∞)

Python实现：

def mean_absolute_error(y_true, y_pred):
    """
    计算平均绝对误差(MAE)
    参数:
        y_true: 真实值数组，形状(n_samples,)
        y_pred: 预测值数组，形状(n_samples,)
    返回:
        mae值
    """
    return np.mean(np.abs(y_true - y_pred))

1.3 分类任务损失函数

1.3.1 交叉熵损失（Cross-Entropy）

二分类表达式：
$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^N [y_i \log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)]$

多分类表达式：
$\mathcal{L} = -\frac{1}{N}\sum_{i=1}^N \sum_{c=1}^C y_{i,c} \log(\hat{y}_{i,c})$

Python实现：

def cross_entropy_loss(y_true, y_pred, epsilon=1e-12):
    """
    计算交叉熵损失
    参数:
        y_true: 真实标签，形状(n_samples, n_classes)或(n_samples,)
        y_pred: 预测概率，形状(n_samples, n_classes)
        epsilon: 小常数防止log(0)
    返回:
        交叉熵损失值
    """
    # 确保预测值在(0,1)区间
    y_pred = np.clip(y_pred, epsilon, 1. - epsilon)
    
    # 如果是二分类且y_true为一维
    if len(y_true.shape) == 1 or y_true.shape[1] == 1:
        loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    else:  # 多分类
        loss = -np.mean(np.sum(y_true * np.log(y_pred), axis=1))
    
    return loss

1.3.2 合页损失（Hinge Loss）

数学表达式：
$\mathcal{L} = \frac{1}{N}\sum_{i=1}^N \max(0, 1 - y_i \cdot \hat{y}_i)$

Python实现：

def hinge_loss(y_true, y_pred):
    """
    计算合页损失(Hinge Loss)
    参数:
        y_true: 真实标签(±1)，形状(n_samples,)
        y_pred: 预测值，形状(n_samples,)
    返回:
        hinge loss值
    """
    return np.mean(np.maximum(0, 1 - y_true * y_pred))

1.4 损失函数对比实验

import matplotlib.pyplot as plt

# 生成模拟数据
y_true = np.linspace(-3, 3, 100)
y_pred = np.zeros_like(y_true)

# 计算不同损失
mse = [mean_squared_error(np.array([t]), np.array([p])) for t, p in zip(y_true, y_pred)]
mae = [mean_absolute_error(np.array([t]), np.array([p])) for t, p in zip(y_true, y_pred)]
hinge = [hinge_loss(np.array([1]), np.array([t])) for t in y_true]  # 假设真实标签为1

# 绘制曲线
plt.figure(figsize=(10, 6))
plt.plot(y_true, mse, label='MSE')
plt.plot(y_true, mae, label='MAE')
plt.plot(y_true, hinge, label='Hinge (y_true=1)')
plt.xlabel('Prediction - True Value')
plt.ylabel('Loss')
plt.title('Comparison of Loss Functions')
plt.legend()
plt.grid(True)
plt.show()

二、激活函数详解

2.1 激活函数的作用与特性

激活函数的主要作用：

引入非线性变换
决定神经元是否被激活
影响梯度传播过程

理想激活函数应具备的特性：

非线性
可微性（至少几乎处处可微）
单调性
输出范围适当

2.2 常见激活函数分析

2.2.1 Sigmoid函数

数学表达式：
$\sigma(x) = \frac{1}{1 + e^{-x}}$

特性分析：

输出范围：(0,1)
容易导致梯度消失
输出不以0为中心

Python实现：

def sigmoid(x):
    """
    Sigmoid激活函数
    参数:
        x: 输入数组
    返回:
        sigmoid激活后的输出
    """
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    """
    Sigmoid函数的导数
    """
    s = sigmoid(x)
    return s * (1 - s)

2.2.2 Tanh函数

数学表达式：
$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

特性分析：

输出范围：(-1,1)
以0为中心
比sigmoid梯度更强

Python实现：

def tanh(x):
    """
    Tanh激活函数
    """
    return np.tanh(x)

def tanh_derivative(x):
    """
    Tanh函数的导数
    """
    return 1 - np.tanh(x)**2

2.2.3 ReLU函数

数学表达式：
$\text{ReLU}(x) = \max(0, x)$

特性分析：

计算简单
缓解梯度消失
存在"死亡ReLU"问题

Python实现：

def relu(x):
    """
    ReLU激活函数
    """
    return np.maximum(0, x)

def relu_derivative(x):
    """
    ReLU函数的导数
    """
    return (x > 0).astype(float)

2.2.4 LeakyReLU函数

数学表达式：
$\text{LeakyReLU}(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha x & \text{otherwise} \end{cases}$

Python实现：

def leaky_relu(x, alpha=0.01):
    """
    LeakyReLU激活函数
    参数:
        x: 输入数组
        alpha: 负半轴的斜率
    """
    return np.where(x > 0, x, alpha * x)

def leaky_relu_derivative(x, alpha=0.01):
    """
    LeakyReLU函数的导数
    """
    dx = np.ones_like(x)
    dx[x < 0] = alpha
    return dx

2.3 激活函数对比实验

# 生成输入数据
x = np.linspace(-5, 5, 100)

# 计算各激活函数输出
y_sigmoid = sigmoid(x)
y_tanh = tanh(x)
y_relu = relu(x)
y_leaky = leaky_relu(x)

# 绘制曲线
plt.figure(figsize=(12, 6))
plt.plot(x, y_sigmoid, label='Sigmoid')
plt.plot(x, y_tanh, label='Tanh')
plt.plot(x, y_relu, label='ReLU')
plt.plot(x, y_leaky, label='LeakyReLU (α=0.01)')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Comparison of Activation Functions')
plt.legend()
plt.grid(True)
plt.show()

三、损失函数与激活函数的组合策略

3.1 常见组合方式

任务类型	推荐损失函数	推荐激活函数	说明
二分类	二元交叉熵	Sigmoid	输出层使用Sigmoid
多分类	分类交叉熵	Softmax	输出层使用Softmax
回归	MSE/MAE	无/线性	输出层通常不使用激活
多标签分类	二元交叉熵	Sigmoid	每个输出节点独立

3.2 组合实验分析

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import pandas as pd

# 创建分类数据集
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 测试不同组合
combinations = [
    {'loss': 'binary_crossentropy', 'output_activation': 'sigmoid'},
    {'loss': 'hinge', 'output_activation': 'tanh'},
    {'loss': 'mse', 'output_activation': 'sigmoid'}
]

results = []

for combo in combinations:
    model = Sequential([
        Dense(64, activation='relu', input_shape=(20,)),
        Dense(32, activation='relu'),
        Dense(1, activation=combo['output_activation'])
    ])
    
    model.compile(optimizer='adam',
                 loss=combo['loss'],
                 metrics=['accuracy'])
    
    history = model.fit(X_train, y_train,
                       epochs=50,
                       batch_size=32,
                       validation_split=0.2,
                       verbose=0)
    
    test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
    
    results.append({
        'Loss Function': combo['loss'],
        'Activation': combo['output_activation'],
        'Test Accuracy': test_acc,
        'Test Loss': test_loss
    })

# 显示结果
df_results = pd.DataFrame(results)
print(df_results[['Loss Function', 'Activation', 'Test Accuracy', 'Test Loss']])

四、高级主题与最新进展

4.1 自定义损失函数实现

import tensorflow as tf

def focal_loss(y_true, y_pred, alpha=0.25, gamma=2.0):
    """
    Focal Loss实现
    参数:
        y_true: 真实标签
        y_pred: 预测概率
        alpha: 类别平衡参数
        gamma: 难易样本调节参数
    返回:
        focal loss值
    """
    # 防止数值溢出
    y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
    
    # 计算交叉熵部分
    cross_entropy = -y_true * tf.math.log(y_pred)
    
    # 计算focal weight
    focal_weight = alpha * tf.pow(1 - y_pred, gamma)
    
    # 计算focal loss
    loss = focal_weight * cross_entropy
    
    # 按样本求和
    return tf.reduce_sum(loss, axis=-1)

# 在Keras模型中使用
model.compile(optimizer='adam',
             loss=focal_loss,
             metrics=['accuracy'])

4.2 激活函数的最新发展

4.2.1 Swish函数

数学表达式：
$\text{Swish}(x) = x \cdot \sigma(\beta x)$

Python实现：

def swish(x, beta=1.0):
    """
    Swish激活函数
    参数:
        x: 输入
        beta: 可学习参数
    """
    return x * sigmoid(beta * x)

def swish_derivative(x, beta=1.0):
    """
    Swish函数的导数
    """
    sig = sigmoid(beta * x)
    return sig + beta * x * sig * (1 - sig)

4.2.2 GELU函数

数学表达式：
$\text{GELU}(x) = x \Phi(x)$
其中 $\Phi(x)$ 是标准正态分布的累积分布函数

Python实现：

def gelu(x):
    """
    GELU激活函数
    """
    return 0.5 * x * (1 + tf.math.erf(x / tf.sqrt(2.0)))

五、完整代码实现

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Layer

class ActivationFunctions:
    """常见激活函数实现集合"""
    
    @staticmethod
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    @staticmethod
    def tanh(x):
        return np.tanh(x)
    
    @staticmethod
    def relu(x):
        return np.maximum(0, x)
    
    @staticmethod
    def leaky_relu(x, alpha=0.01):
        return np.where(x > 0, x, alpha * x)
    
    @staticmethod
    def swish(x, beta=1.0):
        return x * ActivationFunctions.sigmoid(beta * x)
    
    @staticmethod
    def plot_activations(x_range=(-5, 5), n_points=100):
        """绘制各激活函数曲线"""
        x = np.linspace(x_range[0], x_range[1], n_points)
        
        plt.figure(figsize=(12, 6))
        plt.plot(x, ActivationFunctions.sigmoid(x), label='Sigmoid')
        plt.plot(x, ActivationFunctions.tanh(x), label='Tanh')
        plt.plot(x, ActivationFunctions.relu(x), label='ReLU')
        plt.plot(x, ActivationFunctions.leaky_relu(x), label='LeakyReLU (α=0.01)')
        plt.plot(x, ActivationFunctions.swish(x), label='Swish (β=1.0)')
        
        plt.title('Activation Functions Comparison')
        plt.xlabel('Input')
        plt.ylabel('Output')
        plt.legend()
        plt.grid(True)
        plt.show()

class CustomLossFunctions:
    """自定义损失函数集合"""
    
    @staticmethod
    def focal_loss(y_true, y_pred, alpha=0.25, gamma=2.0):
        """Focal Loss实现"""
        y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
        cross_entropy = -y_true * tf.math.log(y_pred)
        focal_weight = alpha * tf.pow(1 - y_pred, gamma)
        return tf.reduce_sum(focal_weight * cross_entropy, axis=-1)
    
    @staticmethod
    def contrastive_loss(y_true, y_pred, margin=1.0):
        """对比损失实现"""
        square_pred = tf.square(y_pred)
        margin_square = tf.square(tf.maximum(margin - y_pred, 0))
        return tf.reduce_mean(
            y_true * square_pred + (1 - y_true) * margin_square
        )
    
    @staticmethod
    def plot_losses(y_true=1, pred_range=(-1, 2), n_points=100):
        """绘制不同损失函数曲线"""
        pred = np.linspace(pred_range[0], pred_range[1], n_points)
        
        # 计算各损失
        mse = (pred - y_true)**2
        mae = np.abs(pred - y_true)
        hinge = np.maximum(0, 1 - y_true * pred)
        
        plt.figure(figsize=(10, 6))
        plt.plot(pred, mse, label='MSE')
        plt.plot(pred, mae, label='MAE')
        plt.plot(pred, hinge, label='Hinge (y_true=1)')
        
        plt.title('Loss Functions Comparison (y_true=1)')
        plt.xlabel('Prediction')
        plt.ylabel('Loss')
        plt.legend()
        plt.grid(True)
        plt.show()

class Swish(Layer):
    """可学习的Swish激活层"""
    
    def __init__(self, trainable_beta=True, **kwargs):
        super(Swish, self).__init__(**kwargs)
        self.trainable_beta = trainable_beta
        if self.trainable_beta:
            self.beta = self.add_weight(
                name='beta',
                shape=(1,),
                initializer='ones',
                trainable=True)
        else:
            self.beta = 1.0
    
    def call(self, inputs):
        if self.trainable_beta:
            return inputs * tf.sigmoid(self.beta * inputs)
        else:
            return inputs * tf.sigmoid(inputs)
    
    def get_config(self):
        config = super(Swish, self).get_config()
        config.update({'trainable_beta': self.trainable_beta})
        return config

# 使用示例
if __name__ == "__main__":
    # 绘制激活函数
    ActivationFunctions.plot_activations()
    
    # 绘制损失函数
    CustomLossFunctions.plot_losses()
    
    # 构建包含Swish的模型
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, input_shape=(20,)),
        Swish(trainable_beta=True),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(optimizer='adam',
                 loss=CustomLossFunctions.focal_loss,
                 metrics=['accuracy'])
    print("Model with Swish activation and Focal Loss compiled successfully.")