Mathematics Basics - Linear Algebra (Matrix Part 3)

最新推荐文章于 2022-05-24 16:10:03 发布

Lin D.

最新推荐文章于 2022-05-24 16:10:03 发布

阅读量358

点赞数 1

分类专栏： Mathematics Basics 文章标签：线性代数矩阵

本文链接：https://blog.csdn.net/datascientistlin/article/details/103875784

版权

Mathematics Basics 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

Eigen Theory

Define Eigen Vectors and Eigen Values

We have seen previously how to transform a single vector to a new vector space by matrix multiplication. In fact, we can also apply transformation on every vector in a vector space. Scaling, sheering, rotating and other types of linear transformation we have learned can all be applied to multiple vectors at the same time. In order to differentiate one transformation from the other, we have to find some characteristics of each transformation. One kind of characteristics is what we will discover later by Eigen vectors and Eigen values.

Let’s start from a transformation in 2-D space. Such a transformation can be visualized as follows. We draw a square centered at the origin with its sides intersecting the axis at basis vectors. Then we perform a transformation by distorting the shape of this square according to the change in basis vectors. The resultant shape shows the change of vectors in the original vector space. For example, we can scale every vector in the original square vertically by factor of 2 as shown below.

Scaling Transformation

Vectors $v_1$ , $v_2$ and $v_3$ have been transformed to $v_1'$ , $v_2'$ and $v_3'$ respectively. It can be seen that $v_1$ and $v_2$ are special because they still lie on the same line after the scaling transformation. These special vectors are the characteristics of this transformation. They are also called Eigen vectors. On the other hand, $v_3$ originally is at 45° to the axis but it is now changed to a different angle. It is thus not an Eigen vector in this scaling transformation. In addition, length of $v_1$ is doubled after transformation, so $v_1$ has an Eigen value of 2. Length of $v_2$ is unchanged. Hence Eigen value for $v_2$ is 1.

We can look at one more example to find the Eigen vectors of a sheering transformation below.

Sheering Transformation

For sheering in the horizontal direction, only $v_2$ remains in its original direction. Both $v_1$ and $v_3$ have changed direction after transformation. Therefore $v_2$ is the Eigen vector in this transformation with an Eigen value 1.

Lastly, we can see what happens if we apply a rotation transformation.

Rotation Transformation

This time no vector remains in its original direction, thus there is no Eigen vector in this transformation.

The concept of Eigen vectors and Eigen values is the same for transformation in three or more dimensions. In general, we define a vector as Eigen vector if, after linear transformation, it still lies on exactly the same or opposite direction as before. The stretch or shrink in length for the Eigen vectors are defined as Eigen values. Therefore, the combination of Eigen vectors and their associated Eigen values can help us define a linear transformation in any dimensions.

Compute Eigen Vectors and Eigen Values Mathematically

It’s relatively easier to visually deduce the Eigen vectors when only simple transformation is performed. However, it becomes challenging soon when we have composite transformation or transformation in more than two dimensions. Therefore, we need to find an analytical method for solving Eigen vectors in a more complicated transformation.

It turns out all Eigen vectors in a transformation must satisfy following equation.

$\lambda x \tag{1}$

where $x$ is the Eigen vector, $A$ is an n by n transformation matrix and $\lambda$ is a scalar (Eigen value).

This equation tells us when an Eigen vector is multiplied by a transformation matrix, the result is just a scalar multiplication of that Eigen vector.

We can rearrange equation (1) to get

$(A-\lambda I)x=0 \tag{2}$

Identity matrix $I$ is introduced here so we can take the proper subtraction between matrix $A$ and scalar $\lambda$ .

The solution to equation (2) could be either $x = 0$ or $A-\lambda I = 0$ . However, $x = 0$ is a trivial solution because the point at origin has no magnitude or direction thus is not transformed. We should instead solve for $A-\lambda I = 0$ . This is equivalent to say the magnitude of matrix $A-\lambda I$ is 0, i.e., $det(A-\lambda I)=0$ .

It’s generally hard to calculate matrix determinant. However, we learned previously that there is a shortcut to solve for matrix determinant if it is a 2 by 2 matrix. For a matrix $A=\begin{pmatrix}a&b\\c&d\end{pmatrix}$ , we can get,

$\begin{aligned} det(A-\lambda I)&=0 \\ det\left(\begin{pmatrix}a&b\\c&d\end{pmatrix}-\begin{pmatrix}\lambda&0\\0&\lambda\end{pmatrix}\right)&=0 \\ det\left(\begin{pmatrix}a-\lambda & b\\c & d-\lambda\end{pmatrix}\right)&=0 \\ (a-\lambda)(d-\lambda)-bc&=0 \\ \lambda^2-(a+d)\lambda+ad-bc&=0 \end{aligned} \tag{3}$

The solution to $\lambda$ in equation (3) is the Eigen value. We can then use this Eigen value to find its corresponding Eigen vectors.

Take our vertical scaling transformation as an example. The transformation matrix is represented by $A=\begin{pmatrix}1&0\\0&2\end{pmatrix}$ . We solve for $det(A-\lambda I) = 0$ .

$\begin{aligned} det\left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-\lambda I\right)&=0 \\ \lambda^2-(1+2)\lambda+1*2-0*0&=0\\ \lambda^2-3\lambda+2&=0 \end{aligned}$

The solution is $\lambda=1$ or $\lambda=2$ .

When $\lambda = 1$ , substituting $A$ and $\lambda$ into equation (2) we get

$\begin{aligned} \left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-1*\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}0&0\\0&1\end{pmatrix}*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}0\\x_2\end{pmatrix}&=0 \end{aligned}$

This shows that any vector with $x_2 = 0$ , i.e. along the horizontal axis, is an Eigen vector. The Eigen value is 1. This aligns with vector $v_2$ in our previous illustration.

When $\lambda = 2$ , substituting $A$ and $\lambda$ into equation (2) we get

$\begin{aligned} \left(\begin{pmatrix}1&0\\0&2\end{pmatrix}-2*\begin{pmatrix}1&0\\0&1\end{pmatrix}\right)*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}-1&0\\0&0\end{pmatrix}*\begin{pmatrix}x_1\\x_2\end{pmatrix}&=0 \\ \begin{pmatrix}-x_1\\0\end{pmatrix}&=0 \end{aligned}$

This shows that any vector with $x_1=0$ , i.e. along the vertical axis, is an Eigen vector. The Eigen value is 2. This aligns with vector $v_1$ in our previous illustration.

Let’s do another example with the rotation transformation. The transformation matrix is represented by $A=\begin{pmatrix}0&-1\\1&0\end{pmatrix}$ . Again, we solve for $det(A-\lambda I )= 0$ .

$\begin{aligned} det\left(\begin{pmatrix}0&-1\\1&0\end{pmatrix}-\lambda I\right)&=0 \\ \lambda^2-(0+0)\lambda+0*0-(-1)*1&=0\\ \lambda^2+1&=0 \end{aligned}$

There is no real number solution for this equation. Therefore, no Eigen value or Eigen vector can be found for the rotation transformation. This result agrees with what we observed intuitively with the previous illustration.

Of course, this method of finding Eigen values and Eigen vectors can get much more difficult for higher dimensional space as it involves solving the roots of a polynomial equation of order n, dimension of matrix $A$ . We shall then turn to computers which employs some iterative method to solve this problem . Nonetheless, understanding the underlying concepts of this analytical method is still valuable.

Use Eigen Vectors to Perform Repeated Multiplication

We shall look at one important application of Eigen vectors, repeated multiplication of a matrix by itself. This happens, for example, when we apply the same transformation matrix to a vector many times. In order to apply a transformation matrix $T$ on a vector $v$ , we perform one multiplication calculation $T * v$ . If we apply the same transformation again, we get $T * T * v$ or $T^2*v$ . Consequently, if we apply the same transformation n times we get $T^n*v$ .

It is in general difficult to perform matrix multiplication repeatedly because rows and columns of the matrix have to be multiplied and summed repeatedly. There is one exception though. If matrix $T$ is a diagonal matrix (all the terms in this matrix are zero except for those along the leading diagonal), repeated multiplication of matrix $T$ can be simplified.

Let’s take an example of $T=\begin{pmatrix}2&0\\0&3\end{pmatrix}$ and try to multiply it by itself repeatedly.

$KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ T^2&=\begin{pm…$

Therefore, we can derive a simplified formula to raise the power of a diagonal matrix $T$ .

$\begin{pmatrix} t_{11}&0&\cdots&0\\ 0&t_{22}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&t_{ii} \end{pmatrix}^{n} =\begin{pmatrix} (t_{11})^n&0&\cdots&0\\ 0&(t_{22})^n&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&(t_{ii})^n \end{pmatrix}$

Note this is true regardless the number of dimension of $T$ .

The task then becomes how we can convert a matrix from its original form to a diagonal matrix form. Let’s take an example of a 2 by 2 matrix $T=\begin{pmatrix}1&1\\0&2\end{pmatrix}$ . We can solve by equation (3) to get Eigen values $\lambda_1=1$ or $\lambda_2=2$ . A diagonal matrix $D$ can be constructed with these 2 Eigen values, $D=\begin{pmatrix}\lambda_1&0\\0&\lambda_2\end{pmatrix}$ . With the Eigen values, we could also come up with 2 Eigen vectors $\begin{pmatrix}1\\0\end{pmatrix}$ and $\begin{pmatrix}1\\1\end{pmatrix}$ . These are our column vectors in an Eigen matrix $C=\begin{pmatrix}1&1\\0&1\end{pmatrix}$ . It turns out that the multiplication result of $T * C$ is the same as $C * D$ due to the special properties of Eigen vectors.

$\begin{aligned} T*C&=\begin{pmatrix}1&1\\0&2\end{pmatrix}*\begin{pmatrix}1&1\\0&1\end{pmatrix}\\ &=\begin{pmatrix}1&2\\0&2\end{pmatrix}\\ &=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}\\ &=C*D \end{aligned}$

Since $T * C = C * D$ , we can get an expression for $T$ alone.

$\begin{aligned} T*C&=C*D\\ T&=C*D*C^{-1} \end{aligned} \tag{4}$

In equation (4), we have $C$ as an Eigen matrix, $D$ as a diagonal matrix and $C^{-1}$ as the inverse of $C$ . So to raise power for matrix $T$ ,

$\begin{aligned} T^2&=C*D*C^{-1}*C*D*C^{-1}\\ &=C*D*I*D*C^{-1}\\ &=C*D^2*C^{-1}\tag{5} \end{aligned}$

$\begin{aligned} T^3&=C*D^2*C^{-1}*C*D*C^{-1}\\ &=C*D^2*I*D*C^{-1}\\ &=C*D^3*C^{-1}\tag{6}\\ \end{aligned}$

$\vdots\\ T^n=C*D^n*C^{-1}$

Let’s verify this result by computing $T^2$ and $T^3$ directly.

$\begin{aligned} T^2=\begin{pmatrix}1&1\\0&2\end{pmatrix}*\begin{pmatrix}1&1\\0&2\end{pmatrix}=\begin{pmatrix}1&3\\0&4\end{pmatrix}\\ T^3=\begin{pmatrix}1&3\\0&4\end{pmatrix}*\begin{pmatrix}1&1\\0&2\end{pmatrix}=\begin{pmatrix}1&7\\0&8\end{pmatrix} \end{aligned}$

Substituting $C=\begin{pmatrix}1&1\\0&1\end{pmatrix}$ , $D=\begin{pmatrix}1&0\\0&2\end{pmatrix}$ and $C^{-1}=\begin{pmatrix}1&-1\\0&1\end{pmatrix}$ into equation (5) and (6), we can get the same result.

$\begin{aligned} T^2=C*D^2*C^{-1}=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}^2*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&4\\0&4\end{pmatrix}*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&3\\0&4\end{pmatrix}\\ T^3=C*D^3*C^{-1}=\begin{pmatrix}1&1\\0&1\end{pmatrix}*\begin{pmatrix}1&0\\0&2\end{pmatrix}^3*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&8\\0&8\end{pmatrix}*\begin{pmatrix}1&-1\\0&1\end{pmatrix}=\begin{pmatrix}1&7\\0&8\end{pmatrix} \end{aligned}$

This proves our formula $A^n=C*D^n*C^{-1}$ works as expected.

Therefore, if we need to apply a transformation matrix $T$ on vector $v$ repeatedly, we can write

$T^n*v=C*D^n*C^{-1}*v$

In a step-by-step manner, here is what we need to do to calculate $T^nv$ .

Calculate the Eigen vectors and Eigen values of matrix $T$ .
Construct an Eigen matrix $C$ from Eigen vectors which defines a new vector space.
Construct a diagonal matrix $D$ from Eigen values.
Find the inverse of matrix $C$ .
Transform vector $v$ from its original vector space into the new vector space, $v_C=C^{-1}*v$ .
Multiply the transformed vector by the diagonal matrix $D$ repeatedly in new vector space, $T^nv]_C=D^n*v_C$ .
Transform the resultant vector back to the original vector space, $T^nv=C*[T^nv]_C$ .

Of course, there is little efficiency gained if we are just doing a transformation 2 or 3 times. The extra steps of computing Eigen vectors and matrix inverse are not simple either. This way of diagonalizing matrix before raising its power becomes much handier when we need to multiply a matrix by a large number of times. One good example is the famous PageRank algorithm used in ranking Google’s search result.

PageRank Application of Repeated Matrix Multiplication

In order to illustrate the idea of PageRank algorithm, we can make a mini “network” as follows. Each circle A, B, C and D represents a webpage. And the arrow between them shows the page link from one to the other. Our goal is to rank the relevance of these webpages for people who have received them as their search result.

Mini "Network"

The central idea underlying the PangeRank algorithm is that the importance of a webpage is related to its links to and from other webpages. We can represent each webpage as a vector in which each row is either 0 or 1 depending on whether there is a link to the corresponding page. For example, for webpage A, it’s linked to webpage B, C and D except for itself. This can be represented as a vector $\begin{pmatrix}0\\1\\1\\1\end{pmatrix}$ . We then normalize this vector to ensure the sum of all rows equal to 1. We call it the link vector for webpage A and denote it as $l_A$ .

$l_A=\begin{pmatrix}0\\\frac{1}{3}\\\frac{1}{3}\\\frac{1}{3}\end{pmatrix}$

Similarly, we can obtain the link vectors for webpage B, C and D.

$\begin{aligned} l_B=\begin{pmatrix}\frac{1}{2}\\0\\0\\\frac{1}{2}\end{pmatrix}, l_C=\begin{pmatrix}0\\0\\0\\1\end{pmatrix}, l_D=\begin{pmatrix}0\\\frac{1}{2}\\\frac{1}{2}\\0\end{pmatrix} \end{aligned}$

A link matrix $L$ is formed by the link vectors of all webpages.

$L=\Bigg(\bigg(l_A\bigg)\bigg(l_B\bigg)\bigg(l_C\bigg)\bigg(l_D\bigg)\Bigg)=\begin{pmatrix}0&\frac{1}{2}&0&0\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&\frac{1}{2}&1&0\end{pmatrix}$

This matrix $L$ summarizes the probability of going from one page to another. For example, the probability of going to webpage A is $\frac{1}{2}$ if we are at webpage B and is 0 elsewhere. And the probability of going to webpage B is $\frac{1}{3}$ when we are at page A and $\frac{1}{2}$ when we are at page D.

We define the rank of a webpage as the long-run probability of ending in that page after some random clicks from page links. This is a self-referential problem as the rank of all the pages are depending on all the other pages. To calculate the rank of webpage A, we need to know the rank of all the other pages, whether these pages link to webpage A and how many outgoing links are in total in these pages. We can define a vector $r_A$ to represent the rank of webpage A as,

$r_A=\sum_{j=1}^{n}L_{Aj}r_j$

where $n$ is the total number of webpages, $L_{Aj}$ is the probability of going to webpage A from any webpage $j$ and $r_j$ is the rank of webpage $j$ .

Moreover, the rank vector $r$ for all web pages can be expressed as

$r = L r$

The rank vector $r$ is updated by matrix $L$ every time we perform this multiplication. Therefore, at timestamp $t$ , the page rank at next timestamp $t + 1$ would be.

$r^{t+1}=Lr^t$

This is an iterative process. We can do the multiplication repeatedly. If after many rounds of multiplication $r$ stops changing, it means we get a constant relationship $r = L r$ . Thinking back our Eigen vector definition. When $r = L r$ holds true for all further multiplication updates, $r$ would be equal to the Eigen vector of matrix $L$ with an Eigen value of 1. This is in fact another way for us to find the Eigen vectors of a matrix.

There is one more thing to consider before we kick off this iterative process. We need to assign an initial value for $r$ at time 0. In this example, we simply assume that all webpages are ranked equally at the beginning. Since there are 4 pages in total,

$r^0=\begin{pmatrix}\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\end{pmatrix}$

We can ask our computer to do the work of updating values of $r$ iteratively. Soon we can see that, with our mini network, vector $r$ converges to

$r=\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}$

The result shows, by randomly clicking around in these 4 webpages there is 12% probability that a user will end up in webpage A. And that probability for page B, C and D is 24%, 24% and 40% respectively. Therefore, our rank of these 4 webpages will be D>B=C>A.

It also means the Eigen vector for matrix $L$ is $\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}$ . Let’s verify this result.

$\begin{aligned} Lr&=\begin{pmatrix}0&\frac{1}{2}&0&0\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&0&0&\frac{1}{2}\\\frac{1}{3}&\frac{1}{2}&1&0\end{pmatrix}*\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}\\ &=\begin{pmatrix}\frac{1}{2}*0.24\\\frac{1}{3}*0.12+\frac{1}{2}*0.4\\\frac{1}{3}*0.12+\frac{1}{2}*0.4\\\frac{1}{3}*0.12+\frac{1}{2}*0.24+1*0.24\end{pmatrix}\\ &=\begin{pmatrix}0.12\\0.24\\0.24\\0.4\end{pmatrix}\\ &=r \end{aligned}$

This, however, only presents the simplest idea of the PageRank algorithm. We did not cover damping factors or any other stability or optimization measures. Nonetheless, hopefully this can demonstrate some usefulness of Eigen vectors.

There are also other topics we did not cover in Eigen theory. For example, how do we perform repeated multiplication when a matrix is not diagonalizable? Our solution for Eigen vectors consist of only real numbers so far. It is in fact possible to have Eigen vectors with complex numbers. These, however, are not the topics we shall focus on here in linear algebra. Grasping the core concept of Eigen vectors and Eigen values would set you up in good shape for what we will discuss in future for other machine learning topics.

That is all. We can conclude our journey on vector and matrix. Hopefully this gives you a solid understanding of linear algebra and build up some intuition about concepts like change basis, matrix transformation, Eigen vectors, etc. These core concepts of linear algebra will also come in handy as you progress along your study of machine learning.

(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)

Lin D.

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Mathematics Basics - Linear Algebra (Matrix Part 3)

Eigen TheoryDefine Eigen Vectors and Eigen ValuesWe have seen previously how to transform a single vector to a new vector space by matrix multiplication. In fact, we can also apply transformation on...
复制链接

扫一扫