数学基础

最新推荐文章于 2023-07-30 15:06:47 发布

dianlong4020

最新推荐文章于 2023-07-30 15:06:47 发布

阅读量874

点赞数

文章标签：开发工具数据结构与算法人工智能

原文链接：http://www.cnblogs.com/mazhimazhi/p/7816610.html

版权

线性代数：

n维向量x乘以x的转置就是一个对称矩阵

矩阵转置时主对角的元素不变

矩阵乘法的计算规则如下图：

为什么是这样的计算规则，参考：Linear Algebra: What matrices actually are

Most high school students in the United States learn about matrices and matrix multiplication, but they often are not taught why matrix multiplication works the way it does. Adding matrices is easy: you just add the corresponding entries. However, matrix multiplication does not work this way, and for someone who doesn’t understand the theory behind matrices, this way of multiplying matrices may seem extremely contrived and strange. To truly understand matrices, we view them as representations of part of a bigger picture. Matrices represent functions between spaces, called vector spaces, and not just any functions either, but linear functions. This is in fact why linear algebra focuses on matrices. The two fundamental facts about matrices is that every matrix represents some linear function, and every linear function is represented by a matrix. Therefore, there is in fact a one-to-one correspondence between matrices and linear functions. We’ll show that multiplying matrices corresponds to composing the functions that they represent. Along the way, we’ll examine what matrices are good for and why linear algebra sprang up in the first place.

Most likely, if you’ve taken algebra in high school, you’ve seen something like the following:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}.$

Your high school algebra teacher probably told you this thing was a “matrix.” You then learned how to do things with matrices. For example, you can add two matrices, and the operation is fairly intuitive:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} + \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 3 \\ 5 & 3 \end{pmatrix}.$

You can also subtract matrices, which works similarly. You can multiply a matrix by a number:

$2 \times \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 4 & 2 \\ 8 & 6 \end{pmatrix}.$

Then, when you were taught how to multiply matrices, everything seemed wrong:

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 4 \\ 7 & 8 \end{pmatrix}.$

That is, to find the entry in the $i$ -th row, $j$ -th column of the product, you look at the $i$ -th row of the first matrix, the $j$ -th column of the second matrix, you multiply together their corresponding numbers, and then you add up the results to get the entry in that position. In the above example, the 1st row, 2nd column entry is a $4$ because the 1st row of the first matrix is $(2, 1)$ , the 2nd column of the second matrix is $(2, 0)$ , and we have $4 = 2 \times 2 + 1 \times 0$ . Moreover, this implies that matrix multiplication isn’t even commutative! If we switch the order of multiplication above, we get

$\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 10 & 7 \\ 2 & 1 \end{pmatrix}.$

How come matrix multiplication doesn’t work like addition and subtraction? And if multiplication works this way, how the heck does division work? The goal of this post is to answer these questions.

To understand why matrix multiplication works this way, it’s necessary to understand what matrices actually are. But before we get to that, let’s briefly take a look at why we care about matrices in the first place. The most basic application of matrices is solving systems of linear equations. A linear equation is one in which all the variables appear by themselves with no powers; they don’t get multiplied with each other or themselves, and no funny functions either. An example of a system of linear equations is

$2x +y = 3 \\ 4x + 3y = 7$

The solution to this system is $x = 1, y = 1$ . Such equations seem simple, but they easily arise in life. For example, let’s say I have two friends Alice and Bob who went shopping for candy. Alice bought 2 chocolate bars and 1 bag of skittles and spent $3, whereas Bob bought 4 chocolate bars and 3 bags of skittles and spent $7. If we want to figure out how much chocolate bars and skittles cost, we can let $x$ be the price of a chocolate bar and $y$ be the price of a bag of skittles and the variables would satisfy the above system of linear equations. Therefore we can deduce that a chocolate bar costs $1 and so does a bag of skittles. This system was particularly easy to solve because one can guess and check the solution, but in general, with $n$ variables and equations instead of 2, it’s much harder. That’s where matrices come in! Note that, by matrix multiplication, the above system of linear equations can be re-written as

$\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix}.$

If only we could find a matrix $A$ , which is the inverse of the matrix $\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$ , so that if we multiplied both sides of the equation (on the left) by $A$ we’d get

$\begin{pmatrix} x \\ y \end{pmatrix} = A \begin{pmatrix} 3 \\ 7 \end{pmatrix}.$

The applications of matrices reach far beyond this simple problem, but for now we’ll use this as our motivation. Let’s get back to understanding what matrices are. To understand matrices, we have to know what vectors are. A vector space is a set with a specific structure, and a vector is simply an element of the vector space. For now, for technical simplicity, we’ll stick with vector spaces over the real numbers, also known as real vector spaces. A real vector space is basically what you think of when you think of space. The number line is a 1-dimensional real vector space, the x-y plane is a 2-dimensional real vector space, 3-dimensional space is a 3-dimensional real vector space, and so on. If you learned about vectors in school, then you are probably familiar with thinking about them as arrows which you can add together, multiply by a real number, and so on, but multiplying vectors together works differently. Does this sound familiar? It should. That’s how matrices work, and it’s no coincidence.

The most important fact about vector spaces is that they always have a basis. A basis of a vector space is a set of vectors such that any vector in the space can be written as a linear combination of those basis vectors. If $v_1, v_2, v_3$ are your basis vectors, then $av_1 + bv_2 + cv_3$ is a linear combination if $a,b,c$ are real numbers. A concrete example is the following: a basis for the x-y plane is the vectors $(1,0), (0,1)$ . Any vector is of the form $(a,b)$ which can be written as

$\begin{pmatrix} a \\ b \end{pmatrix} = a \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 0 \\ 1 \end{pmatrix}$

so we indeed have a basis! This is not the only possible basis. In fact, the vectors in our basis don’t even have to be perpendicular! For example, the vectors $(1,0), (1,1)$ form a basis since we can write

$\begin{pmatrix} a \\ b \end{pmatrix} = (a-b) \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 1 \\ 1 \end{pmatrix}$ .

Now, a linear transformation is simply a function between two vector spaces that happens to be linear. Being linear is an extremely nice property. A function $f$ is linear if the following two properties hold:

$f(x+y) = f(x) + f(y) \\ f(ax) = af(x)$

For example, the function $f(x) = x^2$ defined on the real line is not linear, since $f(x+y) = (x+y)^2 = x^2 + y^2 + 2xy$ whereas $f(x) + f(y) = x^2 + y^2$ . Now, we connect together all the ideas we’ve talked about so far: matrices, basis, and linear transformations. The connection is that matrices are representations of linear transformations, and you can figure out how to write the matrix down by seeing how it acts on a basis. To understand the first statement, we need to see why the second is true. The idea is that any vector is a linear combination of basis vectors, so you only need to know how the linear transformation affects each basis vector. This is because, since the function is linear, if we have an arbitrary vector $v$ which can be written as a linear combination $v = av_1 + bv_2 + cv_3$ , then

$f(v) = f(av_1 + bv_2 + cv_3) = af(v_1) + bf(v_2) + cf(v_3).$

Notice that the value of $f(v)$ is completely determined by the values $f(v_1), f(v_2), f(v_3)$ , and so that’s all the information we need to completely define the linear transformation. Where does the matrix come in? Well, once we choose a basis for both the domain and the target of the linear transformation, the columns of the matrix will represent the images of the basis vectors under the function. For example, suppose we have a linear transformation $f$ which maps $\mathbb{R}^3$ to $\mathbb{R}^2$ , meaning it takes in 3-dimensional vectors and spits out 2-dimensional vectors. Right now $f$ is just some abstract function for which we have no way of writing down on paper. Let’s pick a basis for both our domain (3-space) and our target (2-space, or the plane). A nice choice would be $v_1 = (1,0,0), v_2 = (0,1,0), v_3 = (0,0,1)$ for the former and $w_1 = (1,0), w_2 = (0,1)$ for the latter. All we need to know is how $f$ affects $v_1, v_2, v_3$ , and the basis for the target is for writing down the values $f(v_1), f(v_2), f(v_3)$ concretely. The matrix $M$ for our function will be a 2-by-3 matrix, where the 3 columns are indexed by $v_1, v_2, v_3$ and the 2 rows are indexed by $w_1, w_2$ . All we need to write down $M$ are the values $f(v_1), f(v_2), f(v_3)$ . For concreteness, let’s say

$f(v_1) = 2w_1 + 4w_2 \\ f(v_2) = w_1 - w_2 \\ f(v_3) = w_2.$

Then the corresponding matrix will be

$\begin{pmatrix} 2 & 1 & 0 \\ 4 & -1 & 1 \end{pmatrix}.$

The reason why this works is that matrix multiplication was designed so that if you multiply a matrix by the vector with all zeroes except a 1 in the $i$ -th entry, then the result is just the $i$ -th column of the matrix. You can check this for yourself. So we know that the matrix $M$ works correctly when applied to (multiplied to) basis vectors. But also matrices satisfy the same properties as linear transformations, namely $M(x + y) = Mx + My$ and $M(ax) = aMx$ , where $x,y$ are vectors and $a$ is a real number. Therefore $M$ works for all vectors, so it’s the correct representation of $f$ . Note that if we had chosen different vectors for the basis vectors, the matrix would look different. Therefore, matrices are not natural in the sense that they depend on what bases we choose.

Now, finally to answer the question posed at the beginning. Why does matrix multiplication work the way it does? Let’s take a look at the two matrices we had in the beginning: $A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$ and $B = \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}$ . We know that these correspond to linear functions on the plane, let’s call them $f$ and $g$ , respectively. Multiplying matrices corresponds to composing their functions. Therefore, doing $ABx$ is the same as doing $f(g(x))$ for any vector $x$ . To determine what the matrix $AB$ should look like, we can see how it affects the basis vectors $w_1 = (1,0), w_2 = (0,1)$ . We have

$f(g(w_1)) = f(w_1 + w_2) = f(w_1) + f(w_2) \\ = (2w_1 + 4w_2) + (w_1 + 3w_2) = 3w_1 + 7w_2$

so the first column of $AB$ should be $(3,7)$ , and

$f(g(w_2)) = f(2w_1) = 2f(w_1) = 2(2w_1 + 4w_2) = 4w_1 + 8w_2$

so the second column of $AB$ should be $(4,8)$ . Indeed, this agrees with the answer we got in the beginning by matrix multiplication! Although this is not at all a rigorous proof, since it’s just an example, it captures the idea of the reason matrix multiplication is the way it is.

Now that we understand how and why matrix multiplication works the way it does, how does matrix division work? You are probably familiar with functional inverses. The inverse of a function $f$ is a function $g$ such that $f(g(x)) = x = g(f(x))$ for all $x$ . Since multiplication of matrices corresponds to composition of functions, it only makes sense that the multiplicative inverse of a matrix is the compositional inverse of the corresponding function. That’s why not all matrices have multiplicative inverses. Some functions don’t have compositional inverses! For example, the linear function $f$ mapping $\mathbb{R}^2$ to $\mathbb{R}$ defined by $f(x,y) = x+y$ has no inverse, since many vectors get mapped to the same value (what would $f^{-1}(0)$ be? $(0,0)$ ? $(1,-1)$ ?). This corresponds to the fact that the 1×2 matrix $\begin{pmatrix} 1 & 1 \end{pmatrix}$ has no multiplicative inverse. So dividing by a matrix $B$ is just multiplication by $B^{-1}$ , if it exists. There are algorithms for computing inverses of matrices, but we’ll save that for another post.

（1）你有感觉到某一类矩阵和矩阵相乘，其实就是解方程时的消元吗？

（2）

你有发现解方程时对矩阵的操作，与消元法解方程的对应关系吗？

你有发现行列式的定义和性质，与消元法解方程的对应关系吗？

你有发现求逆矩阵与消元法解方程的对应关系吗？而奇异矩阵与这个消元法解方程又有什么关系呢？

你有发现非常自然的消元法解方程，是连结矩阵、行列式、逆矩阵这些概念线索和纽带吗？这么普普通通的消元法解方程是多少线性代数基础概念的核心啊！所有的东西都不是无中生有的，

线性代数的设定真的不是像国内那些垃圾教材里面描述的好像一只孙猴子一样，像直接从石头缝里蹦出来的啊！

（3）前面已经提到了，三种“理解矩阵变换”，你理解了吗？

（4）为什么行秩和列秩是一样的？涉及四个基本子空间（列空间，零空间，行空间，左零空间），这个东西是我最近才感悟到的。

概率统计：

（1）极大似然思想

（2）贝叶斯模型

（3）隐变量混合概率模型，EM思想

（4）基础的典型分布如高斯分布

微积分：

（1）极值问题与（条件）最优化问题

（2）偏导数，梯度

（3）凸优化和条件最优化问题，这个是理解SVM，或者线性回归等等模型正则化的基础

公开课与相关文献：

（1）麻省理工公开课：线性代数 http://open.163.com/special/opencourse/daishu.html

（2）斯坦福大学公开课：机器学习课程 http://open.163.com/special/opencourse/machinelearning.html

（3）知乎 https://www.zhihu.com/question/36324957

（4）凸优化 http://www.bilibili.com/video/av8907218/

转载于:https://www.cnblogs.com/mazhimazhi/p/7816610.html

dianlong4020

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数学基础

线性代数：n维向量x乘以x的转置就是一个对称矩阵矩阵转置时主对角的元素不变矩阵乘法的计算规则如下图：为什么是这样的计算规则，参考：Linear Algebra: What matrices actuallyareMost high school students in the United States learn about matrices and ...
复制链接

扫一扫