数学基础

 

线性代数:

n维向量x乘以x的转置就是一个对称矩阵

矩阵转置时主对角的元素不变 

矩阵乘法的计算规则如下图:

为什么是这样的计算规则,参考:Linear Algebra: What matrices actually are

Most high school students in the United States learn about matrices and matrix multiplication, but they often are not taught why matrix multiplication works the way it does. Adding matrices is easy: you just add the corresponding entries. However, matrix multiplication does not work this way, and for someone who doesn’t understand the theory behind matrices, this way of multiplying matrices may seem extremely contrived and strange. To truly understand matrices, we view them as representations of part of a bigger picture. Matrices represent functions between spaces, called vector spaces, and not just any functions either, but linear functions. This is in fact why linear algebra focuses on matrices. The two fundamental facts about matrices is that every matrix represents some linear function, and every linear function is represented by a matrix. Therefore, there is in fact a one-to-one correspondence between matrices and linear functions. We’ll show that multiplying matrices corresponds to composing the functions that they represent. Along the way, we’ll examine what matrices are good for and why linear algebra sprang up in the first place.

 

Most likely, if you’ve taken algebra in high school, you’ve seen something like the following:

\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}.

Your high school algebra teacher probably told you this thing was a “matrix.”  You then learned how to do things with matrices. For example, you can add two matrices, and the operation is fairly intuitive:

\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} + \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 3 \\ 5 & 3 \end{pmatrix}.

You can also subtract matrices, which works similarly. You can multiply a matrix by a number:

2 \times \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 4 & 2 \\ 8 & 6 \end{pmatrix}.

Then, when you were taught how to multiply matrices, everything seemed wrong:

\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} 3 & 4 \\ 7 & 8 \end{pmatrix}.

That is, to find the entry in the i-th row, j-th column of the product, you look at the i-th row of the first matrix, the j-th column of the second matrix, you multiply together their corresponding numbers, and then you add up the results to get the entry in that position. In the above example, the 1st row, 2nd column entry is a 4 because the 1st row of the first matrix is (2, 1), the 2nd column of the second matrix is (2, 0), and we have 4 = 2 \times 2 + 1 \times 0. Moreover, this implies that matrix multiplication isn’t even commutative! If we switch the order of multiplication above, we get

\begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \begin{pmatrix} 10 & 7 \\ 2 & 1 \end{pmatrix}.

How come matrix multiplication doesn’t work like addition and subtraction? And if multiplication works this way, how the heck does division work? The goal of this post is to answer these questions.

To understand why matrix multiplication works this way, it’s necessary to understand what matrices actually are. But before we get to that, let’s briefly take a look at why we care about matrices in the first place. The most basic application of matrices is solving systems of linear equations. A linear equation is one in which all the variables appear by themselves with no powers; they don’t get multiplied with each other or themselves, and no funny functions either. An example of a system of linear equations is

2x +y = 3 \\ 4x + 3y = 7

The solution to this system is x = 1, y = 1. Such equations seem simple, but they easily arise in life. For example, let’s say I have two friends Alice and Bob who went shopping for candy. Alice bought 2 chocolate bars and 1 bag of skittles and spent $3, whereas Bob bought 4 chocolate bars and 3 bags of skittles and spent $7. If we want to figure out how much chocolate bars and skittles cost, we can let x be the price of a chocolate bar and y be the price of a bag of skittles and the variables would satisfy the above system of linear equations. Therefore we can deduce that a chocolate bar costs $1 and so does a bag of skittles. This system was particularly easy to solve because one can guess and check the solution, but in general, with n variables and equations instead of 2, it’s much harder. That’s where matrices come in! Note that, by matrix multiplication, the above system of linear equations can be re-written as

\begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix}.

If only we could find a matrix A, which is the inverse of the matrix \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}, so that if we multiplied both sides of the equation (on the left) by A we’d get

\begin{pmatrix} x \\ y \end{pmatrix} = A \begin{pmatrix} 3 \\ 7 \end{pmatrix}.

The applications of matrices reach far beyond this simple problem, but for now we’ll use this as our motivation. Let’s get back to understanding what matrices are. To understand matrices, we have to know what vectors are. A vector space is a set with a specific structure, and a vector is simply an element of the vector space. For now, for technical simplicity, we’ll stick with vector spaces over the real numbers, also known as real vector spaces. A real vector space is basically what you think of when you think of space. The number line is a 1-dimensional real vector space, the x-y plane is a 2-dimensional real vector space, 3-dimensional space is a 3-dimensional real vector space, and so on. If you learned about vectors in school, then you are probably familiar with thinking about them as arrows which you can add together, multiply by a real number, and so on, but multiplying vectors together works differently. Does this sound familiar? It should. That’s how matrices work, and it’s no coincidence.

The most important fact about vector spaces is that they always have a basis. A basis of a vector space is a set of vectors such that any vector in the space can be written as a linear combination of those basis vectors. If v_1, v_2, v_3 are your basis vectors, then av_1 + bv_2 + cv_3 is a linear combination if a,b,c are real numbers. A concrete example is the following: a basis for the x-y plane is the vectors (1,0), (0,1). Any vector is of the form (a,b) which can be written as

\begin{pmatrix} a \\ b \end{pmatrix} = a \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 0 \\ 1 \end{pmatrix}

so we indeed have a basis! This is not the only possible basis. In fact, the vectors in our basis don’t even have to be perpendicular! For example, the vectors (1,0), (1,1) form a basis since we can write

\begin{pmatrix} a \\ b \end{pmatrix} = (a-b) \begin{pmatrix} 1 \\ 0 \end{pmatrix} + b \begin{pmatrix} 1 \\ 1 \end{pmatrix}.

Now, a linear transformation is simply a function between two vector spaces that happens to be linear. Being linear is an extremely nice property. A function f is linear if the following two properties hold:

f(x+y) = f(x) + f(y) \\ f(ax) = af(x)

For example, the function f(x) = x^2 defined on the real line is not linear, since f(x+y) = (x+y)^2 = x^2 + y^2 + 2xy whereas f(x) + f(y) = x^2 + y^2. Now, we connect together all the ideas we’ve talked about so far: matrices, basis, and linear transformations. The connection is that matrices are representations of linear transformations, and you can figure out how to write the matrix down by seeing how it acts on a basis. To understand the first statement, we need to see why the second is true. The idea is that any vector is a linear combination of basis vectors, so you only need to know how the linear transformation affects each basis vector. This is because, since the function is linear, if we have an arbitrary vector v which can be written as a linear combination v = av_1 + bv_2 + cv_3, then

f(v) = f(av_1 + bv_2 + cv_3) = af(v_1) + bf(v_2) + cf(v_3).

Notice that the value of f(v) is completely determined by the values f(v_1), f(v_2), f(v_3), and so that’s all the information we need to completely define the linear transformation. Where does the matrix come in? Well, once we choose a basis for both the domain and the target of the linear transformation, the columns of the matrix will represent the images of the basis vectors under the function. For example, suppose we have a linear transformation f which maps \mathbb{R}^3 to \mathbb{R}^2, meaning it takes in 3-dimensional vectors and spits out 2-dimensional vectors. Right now f is just some abstract function for which we have no way of writing down on paper. Let’s pick a basis for both our domain (3-space) and our target (2-space, or the plane). A nice choice would be v_1 = (1,0,0), v_2 = (0,1,0), v_3 = (0,0,1) for the former and w_1 = (1,0), w_2 = (0,1) for the latter. All we need to know is how f affects v_1, v_2, v_3, and the basis for the target is for writing down the values f(v_1), f(v_2), f(v_3) concretely. The matrix M for our function will be a 2-by-3 matrix, where the 3 columns are indexed by v_1, v_2, v_3 and the 2 rows are indexed by w_1, w_2. All we need to write down M are the values f(v_1), f(v_2), f(v_3). For concreteness, let’s say

f(v_1) = 2w_1 + 4w_2 \\ f(v_2) = w_1 - w_2 \\ f(v_3) = w_2.

Then the corresponding matrix will be

\begin{pmatrix} 2 & 1 & 0 \\ 4 & -1 & 1 \end{pmatrix}.

The reason why this works is that matrix multiplication was designed so that if you multiply a matrix by the vector with all zeroes except a 1 in the i-th entry, then the result is just the i-th column of the matrix. You can check this for yourself. So we know that the matrix M works correctly when applied to (multiplied to) basis vectors. But also matrices satisfy the same properties as linear transformations, namely M(x + y) = Mx + My and M(ax) = aMx, where x,y are vectors and a is a real number. Therefore M works for all vectors, so it’s the correct representation of f. Note that if we had chosen different vectors for the basis vectors, the matrix would look different. Therefore, matrices are not natural in the sense that they depend on what bases we choose.

Now, finally to answer the question posed at the beginning. Why does matrix multiplication work the way it does? Let’s take a look at the two matrices we had in the beginning: A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} and B = \begin{pmatrix} 1 & 2 \\ 1 & 0 \end{pmatrix}. We know that these correspond to linear functions on the plane, let’s call them f and g, respectively. Multiplying matrices corresponds to composing their functions. Therefore, doing ABx is the same as doing f(g(x)) for any vector x. To determine what the matrix AB should look like, we can see how it affects the basis vectors w_1 = (1,0), w_2 = (0,1). We have

f(g(w_1)) = f(w_1 + w_2) = f(w_1) + f(w_2) \\ = (2w_1 + 4w_2) + (w_1 + 3w_2) = 3w_1 + 7w_2

so the first column of AB should be (3,7), and

f(g(w_2)) = f(2w_1) = 2f(w_1) = 2(2w_1 + 4w_2) = 4w_1 + 8w_2

so the second column of AB should be (4,8). Indeed, this agrees with the answer we got in the beginning by matrix multiplication! Although this is not at all a rigorous proof, since it’s just an example, it captures the idea of the reason matrix multiplication is the way it is.

Now that we understand how and why matrix multiplication works the way it does, how does matrix division work? You are probably familiar with functional inverses. The inverse of a function f is a function g such that f(g(x)) = x = g(f(x)) for all x. Since multiplication of matrices corresponds to composition of functions, it only makes sense that the multiplicative inverse of a matrix is the compositional inverse of the corresponding function. That’s why not all matrices have multiplicative inverses. Some functions don’t have compositional inverses! For example, the linear function f mapping \mathbb{R}^2 to \mathbb{R} defined by f(x,y) = x+y has no inverse, since many vectors get mapped to the same value (what would f^{-1}(0) be? (0,0)(1,-1)?). This corresponds to the fact that the 1×2 matrix \begin{pmatrix} 1 & 1 \end{pmatrix} has no multiplicative inverse. So dividing by a matrix B is just multiplication by B^{-1}, if it exists. There are algorithms for computing inverses of matrices, but we’ll save that for another post.


 

 

(1)你有感觉到某一类矩阵和矩阵相乘,其实就是解方程时的消元吗?

(2)

你有发现解方程时对矩阵的操作,与消元法解方程的对应关系吗?

你有发现行列式的定义和性质,与消元法解方程的对应关系吗?

你有发现求逆矩阵与消元法解方程的对应关系吗?而奇异矩阵与这个消元法解方程又有什么关系呢?

你有发现非常自然的消元法解方程,是连结矩阵、行列式、逆矩阵这些概念线索和纽带吗?这么普普通通的消元法解方程是多少线性代数基础概念的核心啊!所有的东西都不是无中生有的,

线性代数的设定真的不是像国内那些垃圾教材里面描述的好像一只孙猴子一样,像直接从石头缝里蹦出来的啊!


(3)前面已经提到了,三种“理解矩阵变换”,你理解了吗?

(4)为什么行秩和列秩是一样的?涉及四个基本子空间(列空间,零空间,行空间,左零空间),这个东西是我最近才感悟到的。

 

 

 

概率统计:

(1)极大似然思想

(2)贝叶斯模型

(3)隐变量混合概率模型,EM思想

(4)基础的典型分布如高斯分布

 

微积分:

(1)极值问题 与 (条件)最优化问题

(2)偏导数,梯度

(3)凸优化和条件最优化问题,这个是理解SVM,或者线性回归等等模型正则化的基础

 

 

 

公开课与相关文献:

 

(1)麻省理工公开课:线性代数   http://open.163.com/special/opencourse/daishu.html

(2)斯坦福大学公开课 :机器学习课程   http://open.163.com/special/opencourse/machinelearning.html

(3)知乎 https://www.zhihu.com/question/36324957

(4)凸优化 http://www.bilibili.com/video/av8907218/

 

 

 

 

 

转载于:https://www.cnblogs.com/mazhimazhi/p/7816610.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值