随机过程在计算机视觉,概率和随机过程教程-- by 何毓琦

Part 1

Probability is often characterized as “a precise way

to deal with our ignorance or uncertainty”. Everyone

has an intuitive understanding of the question “what are the chance

of (something happening)?”. Stochastic process is

then dealing with probabilities over time (or over some independent

and indexed variable such as distance). There exist a number of

excellent or classic textbooks on probability and stochastic

processes. It is one of my favorite oral examine

question which I always tell student beforehand to prepare as well

as in my opinion the most useful tools of an applied mathematician

and/or engineer.

Yet in my experience it is also one of the most confusing

subjects for many students to learn. Why?

In this series of blog articles (of which this is the

first) I shall try to explain the subject in my

own way and my experience in learning the subject. It is NOT my

intention to replace the excellent textbooks.The main

purpose of these articles, I hope, is that by reading the articles

will make the subject matter more approachable and less imposing.

They are NOT meant to replace the many excellent textbook on the

subject. I write this article not

in the rigorous style required for a scholastic textbook but more

in the spirit of a teacher who is engaged in a face-to-face session

with a student. It will be highly informal but

will make the big picture come across easier. Hopefully, it will

even make it possible to read and gain insight to textbook sand

articles written in measure-theoretic language. My approach will be

strictly from a user point of view requiring nothing beyond

freshman calculus and ability to visualize n-dimensional space as a

natural generalization of our familiar 3-D space. So here goes . . .

Let us start by making one simplifying assumption which for

people interested in practical application is not at all important

or restrictive. This is the

Finiteness Assumption (FA) – We assume there is

no INFINITLY large number, i.e., no infinity but there can be very

large numbers, e.g.10^100 (a number estimated to be larger than the

total number of atoms in the universe.) If one deals only with real

computation on digital computers, this assumption is automatically

satisfied. By making this assumption we assume away all the

measure-theoretic terminologies that populate theoretical

probability literature and confuse the uninitiated.

With the FA assumption we now define what is a random

variable.

Random Variable (r.v.) – a random variable is a

variable that may take on any number of finite values when sampled

(i.e. looked at). We characterize a r.v. by specifying its

histogram.A histogram spells out which sampled

values in a range of values the r.v.may take on what percentage of

the time. Fig. 1 it a typical histogram. It is actually a histogram

of a random variable which is the readership (or hits) of my blog

articles for the past four years.

a4c26d1e5885305701be709a3d33442f.png

Fig. 1 histogram of readership of my blog articles (2009-2013):

x-axis is #of hits, y-axis is #of article in this hit range

Note each bar of the histogram is expressed as a percentage so

that the total sum of bars adds up to one or 100%, i.e.,with

probability one (for sure) the r.v. takes on values somewhere in

the total range. While the range of values this r.v. may take on is

finite by virtue of assumption FA, to completely

specify ar.v. still can take a great deal of data. (In fact, it

took me about 3 hours to collect data and make this graph which is

why I did not compile the data for all 5+ year of my blog life)

This is inconvenient in computation. To simplify

the description (specification)we develop two common rough

characterizations.

The Mean of a r.v. – Intuitively, if you

imagine a cardboard cutout of the shape of the histogram, then the

value along the x-axis at which a knife edge placed perpendicular

to the x-axis that will balance this cardboard shape is the mean of

this r.v.. Mathematically, it is simply the

average of the value of hits for each article, the Science Net in

fact compute this value for all bloggers and displays the top-100

bloggers. Mown current average happens to be 4130 per article and

ranks 26th on the list.

Variance of a r.v. - This is a measure of the

spread of the histogram. As mall variance roughly mean the

histogram is mostly spread over a small range of numbers around its

mean and vice versa for a large variance. It is a measure of the

variability of the values of the r.v.. In stock market terminology,

the b of a stock is simply the variance of the daily value of the

stock and a measure of its volatility. Mathematically variance is

called the second central moments of the

histogram

Now we can develop further rough characterization of the

histogram by defining what are called its higher central moments,

such as skewness of the histogram, which is the

third central moment. But in practice such higher

moment are rarely needed nor data on these moments often

available.

So much for a single r.v.. But we often have to deals with more

than one random variable. Let us consider two r.v.s, x and y. Now

the histogram of the random variables x-y becomes a 3D

object.Graphically it looks like a multi-peak terrain map (think of

Quilin in theKwangxi province of south China or the skyscrapers of

the Manhattan island of NY). But here a new concept intrudes. It's

called “joint probability” or

“correlation/covariance (in case of an approximate

specification)” between the r.v.s x and y. It captures

relationship, if any, between the r.v.s. We are all familiar with

notion that smart parents tends to produce smart children. If we

represent the intelligence of parents as r.v. x and that of the

child is .r.v y, then mathematically we say y is positively

correlated with x. If we look down on the 3D histogram of x and y,

then we shall see the peaks scatter along a northeast to southwest

direction as illustrated in Fig.2

a4c26d1e5885305701be709a3d33442f.png

Fig.2 bird’s eye view of 3D histogram with correlation

In other words, knowing the value of y will give a different

idea about the probable value of x. More generally we say x and y

are NOT independent but

correlated. Mathematically we denote the joint

probability p(x,y) (i.e., the histogram) as a general 3D function.

We also define conditional probability of x given

the value of y as

p(x/y)=p(x,y)/p(y) or p(y/x)= p(x,y)/p(x)

Where p(y) and p(x) , called marginally

probability of y and x respectively are simply the

resultant 2D histograms when we collapse the 3Dhistogram onto the y

or x axis respectively. Graphically, the conditional probability

p(x/y) is simply the 2Dhistogram one sees if we

take a cross sectional view of the 3D histogram at the particular

value of y. Mathematically we need to divide p(x,y) by p(y) to

normalize the values so that p(x/y) will still have area equal to

one (100%)satisfying the definition of a histogram.

Now it is possible that the bird’s eye view of the 3D histogram

is a rectangle (vs. the view of Fig. 2). In other word p(x/y)=p(x)

no matter which value of y we choose. In this case, by definition

of p(x/y), we have p(x,y)=p(y)p(x). We say the r.v.s x and y are

independent. Intuitively this satisfies the notion

that knowing y does not tell us anything new about the probable

values of x and vice versa about y when knowing x.

Computationally,this simplifies a function of 2 variables into

product of single variable functions, a great computational

simplification when n random variables are involved.

To roughly characterize the two general r.v.s we have a mean

vector [x,y] and a 2x2 covariance matrix with diagonal element the

variance of x and y and the symmetrical covariance in the

off-diagonal position

sx2

sxy

syxsy2

To summarize. We have so far introduced concepts

1. 1.Random variable characterized by histograms

2. 2.Rough characterization of histograms by mean and variance

3. 3.Joint probability (3Dhistogram) of two r.v.s

4. 4.Independence and conditional probability

5. 5.Covariance matrix

Now suppose we have n r.v.s [ x1 ,

x2 , . . . , xn] instead of two, everything I

said about the two r.v.s apply.We merely have to change 2D and 3D

to n and n+1 dimensions. The mean of n r.v.s becomes a n-vector and

the covariance matrix is a nxn matrix. In your mind’s eye you can

visualize everything in n dimension the same way as Fig.1 and 2.The

joint probability (histogram) p(x1 , x2 , . .

. , x)is a n variable function. And if the n variables are

independent from each other, we write p(x1 ,

x2 , . . . , xn

)=p(x1)p(x2). . . p(xn). No new

concepts are involved.

Concept-wise, believe it or not,these in my opinion

are all you need to know about probability and stochastic processes

to function in the engineering world even if your interest is

academic and theoretical. In my 46 years of active

research and engineering consulting in stochastic control and

optimization, I never had to go beyond the knowledge described

above. The following articles will simply illustrate and explain

how to apply these ideas to more practical uses.

Computationally, because of exponential growth, to deal with

arbitrary n-variable function is impossible.http://blog.sciencenet.cn/blog-1565-26889.html.

Data-wise, it also involve astronomically large amount of data. To

simplify notations at least theoretically, we make a continuous

approximation of these discrete data and introduce continuous

variables and functions. To emphasize,for our purpose, this is only

a convenient approximation and simplification. No new ideas are

involved. This will be the content of next article. Beyond

introducing continuous variables, we also need to develop various

special cases of joint probability structures to simplify

description and calculations,subsequent articles will address these

issues. Once again, let me emphasize that from my view point these

simplifications and special cases are need for computational

feasibility and practicality. Nothing conceptually new is

involved.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值