Deep learning--------------Whitening

Introduction

We have used PCA to reduce the dimension of the data. There is a closely related preprocessing step called whitening (or, in some other literatures, sphering) which is needed for some algorithms. If we are training on images, the raw input is redundant, since adjacent pixel values are highly correlated. The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.

2D example

We will first describe whitening using our previous 2D example. We will then describe how this can be combined with smoothing, and finally how to combine this with PCA.

How can we make our input features uncorrelated with each other? We had already done this when computing \textstyle x_{\rm rot}^{(i)} = U^Tx^{(i)}. Repeating our previous figure, our plot for \textstyle x_{\rm rot} was:

PCA-rotated.png

The covariance matrix of this data is given by:

\begin{align}\begin{bmatrix}7.29 & 0  \\0 & 0.69\end{bmatrix}.\end{align}

(Note: Technically, many of the statements in this section about the "covariance" will be true only if the data has zero mean. In the rest of this section, we will take this assumption as implicit in our statements. However, even if the data's mean isn't exactly zero, the intuitions we're presenting here still hold true, and so this isn't something that you should worry about.)

It is no accident that the diagonal values are \textstyle \lambda_1 and \textstyle \lambda_2. Further, the off-diagonal entries are zero; thus, \textstyle x_{​{\rm rot},1} and \textstyle x_{​{\rm rot},2}are uncorrelated, satisfying one of our desiderata for whitened data (that the features be less correlated).

To make each of our input features have unit variance, we can simply rescale each feature \textstyle x_{​{\rm rot},i} by \textstyle 1/\sqrt{\lambda_i}. Concretely, we define our whitened data \textstyle x_{​{\rm PCAwhite}} \in \Re^n as follows:

\begin{align}x_{​{\rm PCAwhite},i} = \frac{x_{​{\rm rot},i} }{\sqrt{\lambda_i}}.   \end{align}

Plotting \textstyle x_{​{\rm PCAwhite}}, we get:

PCA-whitened.png

This data now has covariance equal to the identity matrix \textstyle I. We say that \textstyle x_{​{\rm PCAwhite}} is our PCA whitened version of the data: The different components of \textstyle x_{​{\rm PCAwhite}} are uncorrelated and have unit variance.

Whitening combined with dimensionality reduction. If you want to have data that is whitened and which is lower dimensional than the original input, you can also optionally keep only the top \textstyle k components of \textstyle x_{​{\rm PCAwhite}}. When we combine PCA whitening with regularization (described later), the last few components of \textstyle x_{​{\rm PCAwhite}} will be nearly zero anyway, and thus can safely be dropped.

ZCA Whitening

Finally, it turns out that this way of getting the data to have covariance identity \textstyle I isn't unique. Concretely, if \textstyle R is any orthogonal matrix, so that it satisfies \textstyle RR^T = R^TR = I (less formally, if \textstyle R is a rotation/reflection matrix), then \textstyle R \,x_{\rm PCAwhite} will also have identity covariance. In ZCA whitening, we choose \textstyle R = U. We define

\begin{align}x_{\rm ZCAwhite} = U x_{\rm PCAwhite}\end{align}

Plotting \textstyle x_{\rm ZCAwhite}, we get:

ZCA-whitened.png

It can be shown that out of all possible choices for \textstyle R, this choice of rotation causes \textstyle x_{\rm ZCAwhite} to be as close as possible to the original input data \textstyle x.

When using ZCA whitening (unlike PCA whitening), we usually keep all \textstyle n dimensions of the data, and do not try to reduce its dimension.

Regularizaton

When implementing PCA whitening or ZCA whitening in practice, sometimes some of the eigenvalues \textstyle \lambda_i will be numerically close to 0, and thus the scaling step where we divide by \sqrt{\lambda_i} would involve dividing by a value close to zero; this may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we therefore implement this scaling step using a small amount of regularization, and add a small constant \textstyle \epsilon to the eigenvalues before taking their square root and inverse:

\begin{align}x_{​{\rm PCAwhite},i} = \frac{x_{​{\rm rot},i} }{\sqrt{\lambda_i + \epsilon}}.\end{align}

When \textstyle x takes values around \textstyle [-1,1], a value of \textstyle \epsilon \approx 10^{-5} might be typical.

For the case of images, adding \textstyle \epsilon here also has the effect of slightly smoothing (or low-pass filtering) the input image. This also has a desirable effect of removing aliasing artifacts caused by the way pixels are laid out in an image, and can improve the features learned (details are beyond the scope of these notes).

ZCA whitening is a form of pre-processing of the data that maps it from \textstyle x to \textstyle x_{\rm ZCAwhite}. It turns out that this is also a rough model of how the biological eye (the retina) processes images. Specifically, as your eye perceives images, most adjacent "pixels" in your eye will perceive very similar values, since adjacent parts of an image tend to be highly correlated in intensity. It is thus wasteful for your eye to have to transmit every pixel separately (via your optic nerve) to your brain. Instead, your retina performs a decorrelation operation (this is done via retinal neurons that compute a function called "on center, off surround/off center, on surround") which is similar to that performed by ZCA. This results in a less redundant representation of the input image, which is then transmitted to your brain.






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Pre-whitening is a technique used in signal processing to remove the spectral correlation of a signal, thus making it easier to analyze or model. Here is an example of how to pre-whiten a signal using Python and the NumPy library. First, let's import the necessary libraries: ```python import numpy as np import matplotlib.pyplot as plt from scipy.signal import lfilter, butter ``` Next, let's generate a simple signal consisting of two sinusoids with different frequencies and amplitudes: ```python fs = 1000 # Sampling rate in Hz t = np.arange(0, 1, 1/fs) # Time vector from 0 to 1 second n = len(t) # Number of samples f1 = 50 # First sinusoid frequency in Hz f2 = 200 # Second sinusoid frequency in Hz A1 = 1 # First sinusoid amplitude A2 = 0.5 # Second sinusoid amplitude x = A1*np.sin(2*np.pi*f1*t) + A2*np.sin(2*np.pi*f2*t) # Signal ``` We can plot the signal to visualize it: ```python plt.plot(t, x) plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.show() ``` ![Signal plot](https://i.imgur.com/lNPF9fn.png) Now we can pre-whiten the signal using a first-order Butterworth high-pass filter with a cutoff frequency of 10 Hz. This will remove the low-frequency components of the signal and leave us with a white noise signal: ```python f_cutoff = 10 # Cutoff frequency in Hz b, a = butter(1, f_cutoff/(fs/2), btype='highpass') # High-pass filter coefficients x_filt = lfilter(b, a, x) # Apply filter to signal ``` We can plot the filtered signal to visualize it: ```python plt.plot(t, x_filt) plt.xlabel('Time (s)') plt.ylabel('Amplitude') plt.show() ``` ![Filtered signal plot](https://i.imgur.com/vhn6UFW.png) As you can see, the pre-whitened signal has a flat spectral density, which means that its power is uniformly distributed across all frequencies. This makes it easier to analyze or model the signal without being biased by its spectral correlation.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值