[机器学习速成课程] 稀疏性正则化 (Regularization for Sparsity)-学习笔记

最新推荐文章于 2024-07-17 07:45:00 发布

Yulong.Wang

最新推荐文章于 2024-07-17 07:45:00 发布

阅读量7.8k

点赞数 4

分类专栏：学习笔记

本文链接：https://blog.csdn.net/qq_20039347/article/details/79619836

版权

本文介绍了如何通过L1正则化来增加模型的稀疏性，以减小模型大小。重点在于理解L1正则化在逻辑回归中的应用，以及如何寻找合适的正则化系数，以满足模型参数数量不超过600个且验证集对数损失函数低于0.35的要求。通过实验不同强度的L1正则化参数，探究其对模型性能的影响。

摘要由CSDN通过智能技术生成

稀疏性和 L1 正则化

学习目标：

计算模型大小
通过应用 L1 正则化来增加稀疏性，以减小模型大小

降低复杂性的一种方法是使用正则化函数，它会使权重正好为零。对于线性模型（例如线性回归），权重为零就相当于完全没有使用相应特征。除了可避免过拟合之外，生成的模型还会更加有效。

L1 正则化是一种增加稀疏性的好方法。

导入，跟之前一样

import math

from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

california_housing_dataframe = pd.read_csv("https://storage.googleapis.com/mledu-datasets/california_housing_train.csv", sep=",")

california_housing_dataframe = california_housing_dataframe.reindex(
    np.random.permutation(california_housing_dataframe.index))

样本特征，跟之前一样

def preprocess_features(california_housing_dataframe):
  """Prepares input features from California housing data set.

  Args:
    california_housing_dataframe: A Pandas DataFrame expected to contain data
      from the California housing data set.
  Returns:
    A DataFrame that contains the features to be used for the model, including
    synthetic features.
  """
  selected_features = california_housing_dataframe[
    ["latitude",
     "longitude",
     "housing_median_age",
     "total_rooms",
     "total_bedrooms",
     "population",
     "households",
     "median_income"]]
  processed_features = selected_features.copy()
  # Create a synthetic feature.
  processed_features["rooms_per_person"] = (
    california_housing_dataframe["total_rooms"] /
    california_housing_dataframe["population"])
  return processed_features

def preprocess_targets(california_housing_dataframe):
  """Prepares target features (i.e., labels) from California housing data set.

  Args:
    california_housing_dataframe: A Pandas DataFrame expected to contain data
      from the California housing data set.
  Returns:
    A DataFrame that contains the target feature.
  """
  output_targets = pd.DataFrame()
  # Create a boolean categorical feature representing whether the
  # medianHouseValue is above a set threshold.
  output_targets["median_house_value_is_high"] = (
    california_housing_dataframe["median_house_value"] > 26