MathematicalStatistics (1)

博客探讨了数据建模的重要性,从超几何分布到样本检验,再到一元和二元模型的应用。内容涵盖如何从数据中提取有用信息,评估实验结果的泛化能力,以及在不确定条件下对模型有效性的判断。同时,它还讨论了如何通过模型层次来寻找更合适的描述方法。
摘要由CSDN通过智能技术生成

“Models of course, are never true but fortunately it is only necessary that they are useful” – Geoge Box

  • We can conceptualize the data structure and our goals more precisely, begin this in the simple examples.
  • We can derive methods of extracting userful information from data and give methods that assess the generalizability of experimental results.
  • We can assess the effectiveness of the methods we proposed.
  • We can decide if the models we propose are approximations to the mechanism generating the data adequate for our purposes.
  • We can be guided to alternative or more general desciptions that might fit better. Hierarchies of models are discussed throughout.

Examples:

a. A population of N elements, N θ N\theta Nθ of which are defetive. To get information about θ \theta θ, a sample of n is drawn without replacement and inspected.

b. We want to study a physical or economic feature, for example, height or income, is distributed in a large population.

c. An experimenter makes n independent determinations of the value of a physical constant μ \mu μ.His or her measurements are subject to random fluctuations(error) and the data can be thought of as μ \mu μ plus some random errors.

d. We want to compare the efficacy of two ways of doing something under similar conditions such as brewing coffee, reducing pollution,treating a disease,producing energy,learing a maze, and so on.Random variablity here would come primarily from differing responses among patients to the same drug but also from error in measurements and variation in the purity of the drugs.

Sampling Inspection:

The sample space consists of the numbers 0,1,…,n corresponding to the number of defective items found.

P [ X = k ] = ( N θ k ) ( N − N θ n − k ) ( N n ) P[X=k] = \frac{\begin{pmatrix}N\theta\\k\end{pmatrix}\begin{pmatrix}N-N\theta\\n-k\end{pmatrix}}{\begin{pmatrix}N\\n\end{pmatrix}} P[X=k]=(Nn)(Nθk)(NNθnk)

If m a x ( n − N ( 1 − θ ) , 0 ) ≤ k ≤ m i n ( N θ , n ) max(n-N(1-\theta),0)\le k \le min(N\theta, n) max(nN(1θ),0)kmin(Nθ,n)

Thus X has an hypergeometirc H ( N θ , N , n ) \mathcal{H}(N\theta,N,n) H(Nθ,N,n) distribution.

N θ N\theta Nθ is unknown, we cannot specify the probability completely, only give a family { H ( N θ , N , n ) } \{\mathcal{H}(N\theta,N,n)\} {H(Nθ,N,n)} of probability distributions. Any one of which could have generated the data actually observed.

Sample from a Population, One-Sample Models:

If the measurements are scalar, we observe x 1 , ⋯   , x n x_1,\cdots,x_n x1,,xn , which are modeled as realizations of X 1 , ⋯   , X n X_1,\cdots,X_n X1,,Xn independent,identically distributed (i.i.d.) random variables with common unknown distribution function F.

X ∼ F X\sim F XF

The model is fully described by the set F \mathcal{F} F of distributions that we specify.

X i = μ + ϵ i , 1 ≤ i ≤ n X_i = \mu + \epsilon_i, 1\le i\le n Xi=μ+ϵi,1in

We postulate:

  • The value of the error committed on one determination does not affect hte value of the error at other times. That is ϵ 1 , ⋯   , ϵ n \epsilon_1,\cdots,\epsilon_n ϵ1,,ϵn are independent.
  • The distribution of the error at one determination is the same as that at another.Thus ϵ 1 , ⋯   , ϵ n \epsilon_1,\cdots,\epsilon_n ϵ1,,ϵn are identically distributed.
  • The distribution of ϵ \epsilon ϵ is independent of μ \mu μ
  • The common distribution of the errors is N ( 0 , σ 2 ) \mathcal{N}(0,\sigma^2) N(0,σ2).

Two-Sample Models:

Drug A is a standard or placebo,we refer to the x’s as control observations. y’s denote the responses of subjects given a new drug or treatment that is being evaluated by comparing its effect with that of the placebo, that we call treatment observations.

Natural initial assumptions here are:

(1). The x’s and y’s are realizations of X 1 , ⋯   , X m X_1,\cdots,X_m X1,,Xm a sample from F, and Y 1 , ⋯   , Y n Y_1,\cdots,Y_n Y1,,Yn a sample from G.The model is specified by the set of possible (F,G) pairs.

(2) Suppose that treatment A had been administered to a subject response x would have been obtained. Then if treatment B had been administered to the same subject instead of A, response y = x + Δ y = x + \Delta y=x+Δ would be obtained where Δ \Delta Δ does not depend on x.

(3) The control responses are normally distributed.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值