Serial Number Estimation
Content
Parameter description
- If θ \theta θ is contained in a function, n n n would be the total sample numbers, and θ ^ \hat{\theta} θ^ would be the estimator for actual maximum ID.
- If M M M is contained in a function, k k k would be the total sample numbers, and N N N would be the actual maximum ID. As for M M M, that is a random variable of maximum ID in a random sample.
Method 1: Probability of each sample
The estimator used to predict the maximum value can also be determined by assuming that the probability of getting each sample is uniform where θ \theta θ represents the actual maximum ID in each day.
P ( x ) = 1 θ P(x) = \frac{1}{\theta} P(x)=θ1
Method 2: Probability of maximum sample
According to the Assumption 1
, we consider the observed maximum ID as a r.v M
, and take the maximum ID we encountered in one specific day as m
(i.e. x n : n x_{n:n} xn:n). Assume that the N
is the actual maximum ID and k
represents the number of ill sample, the probability mass function (PMF) of getting the maximum ID can be expressed as follows:
P ( M = m ) = C k − 1 m − 1 C k N P(M = m) = \frac{C_{k-1}^{m-1}}{C_k^N} P(M=m)=CkNCk−1m−1
Point Estimate
Estimators intuited from discrete uniform distribution
Estimator 1: 2*Mean-1
Consider continuous distribution for this problem, i.e. U N I F ( 0 , θ ) UNIF(0,\theta) UNIF(0,θ)
F o r U N I F ( 0 , θ ) , E ( X ) = θ 2 , V a r ( X ) = θ 2 12 For~UNIF(0,\theta),E(X)=\frac{\theta}{2},Var(X)=\frac{\theta^2}{12} For UNIF(0,θ),E(X)=2θ,Var(X)=12θ2
We consider the following estimator:
θ ^ 1 = 2 n ∑ i = 1 n X i − 1 f o r d i s c r e t e d i s t r u b u t i o n θ ^ 1 = 2 n ∑ i = 1 n X i f o r c o n t i n u o u s d i s t r u b u t i o n E ( θ ^ 1 ) = E ( 2 n ∑ i = 1 n X i − 1 ) = 2 E ( X ‾ ) = θ V a r ( θ ^ 1 ) = 4 n 2 ∑ i = 1 n V a r ( X i ) = 4 n 2 ∑ i = 1 n θ 2 12 = θ 2 3 n ∴ θ ^ 1 i s a n u n b i a s e d e s t i m a t o r w i t h V a r = θ 2 3 n \widehat{\theta}_1=\frac{2}{n}\sum^n_{i=1}X_i-1~for~discrete ~distrubution\\ \widehat{\theta}_1=\frac{2}{n}\sum^n_{i=1}X_i~for~continuous ~distrubution\\ E(\widehat{\theta}_1)=E(\frac{2}{n}\sum^n_{i=1}X_i-1)=2E(\overline{X})=\theta\\ Var(\widehat{\theta}_1)=\frac{4}{n^2}\sum_{i=1}^nVar(X_i)=\frac{4}{n^2}\sum_{i=1}^n\frac{\theta^2}{12}=\frac{\theta^2}{3n}\\ \therefore \widehat{\theta}_1~is~an~unbiased~estimator ~with~Var=\frac{\theta^2}{3n}\\ θ
1=n2i=1∑nXi−1 for discrete distrubutionθ
1=n2i=1∑nXi for continuous distrubutionE(θ
1)=E(n2i=1∑nXi−1)=2E(X)=θVar(θ
1)=n24i=1∑nVar(Xi)=n24i=1∑n12θ2=3nθ2∴θ
1 is an unbiased estimator with Var=3nθ2
Estimator 2: Max + Avg GAP
Consider other form of improvement from MLE estimator, i.e. using average approach to estimate the GAP between maximum and the upper limit:
θ ^ 2 = X n : n + 1 n − 1 ∑ i > j ( X i − X j − 1 ) … f o r d i s c r e t e c a s e θ ^ 2 = X n : n + 1 n − 1 ∑ i > j ( X i − X j ) … f o r c o n t i n u o u s c a s e \widehat{\theta}_2 = X_{n:n}+\frac{1}{n-1}\sum_{i>j}(X_i-X_j-1)\quad\dots for~discrete~case\\ \widehat{\theta}_2 = X_{n:n}+\frac{1}{n-1}\sum_{i>j}(X_i-X_j) \quad\dots for~continuous~case θ 2=Xn:n+n−11i>j∑(Xi−Xj−1)…for discrete caseθ 2=Xn:n+n−11i>j∑(Xi−Xj)…for continuous case
Calculate the expected value and variance to determine if this estimator is biased or not.
E ( θ ^ 2 ) = E ( X n : n ) + 1 n − 1 ∑ i > j E ( X i − X j ) = n θ n + 1 V a r ( θ ^ 2 ) = n θ 2 ( n + 1 ) ( n − 1 ) ( n + 2 ) E(\widehat{\theta}_2) = E(X_{n:n}) + \frac{1}{n-1}\sum_{i>j}E{(X_i-X_j)} = \frac{n\theta}{n+1} \\ Var(\hat{\theta}_2) = \frac{n\theta^2}{(n+1)(n-1)(n+2)} E(θ
2)=E(Xn:n)+n−11i>j∑E(Xi−Xj)=n+1nθVar(θ^2)=(n+1)(n−1)(n+2)nθ2
Therefore, θ 2 \theta_2 θ2 is a biased estimator.
Estimator3: Min+max estimator
We know that maximum sample ID is what’s closed to the upper limit, and we could add more information to it. Intuitively, we first consider minimum sample ID + maximum sample ID:
θ ^ 3 = x 1 : n + x n : n F X n : n ( x ) = [ F X ( x ) ] n = x n θ n , f X n : n ( x ) = n x n − 1 θ n E [ X n : n ] = ∫ x n x n − 1 θ n d x = n n + 1 θ E [ X n : n 2 ] = ∫ x 2 n x n − 1 θ n d x = n n + 2 θ 2 F X 1 : n ( x ) = 1 − [ 1 − F X ( x ) ] n = 1 − ( θ − x θ ) n , f X 1 : n ( x ) = n ( θ − x ) n − 1 θ n E [ X 1 : n ] = ∫ x n ( θ − x ) n − 1 θ n d x = 1 n + 1 θ E [ X 1 : n 2 ] = ∫ x 2 n ( θ − x ) n − 1 θ n d x = 2 n ( n + 1 ) θ 2 E ( θ ^ 3 ) = E ( x 1 : n ) + E ( x n : n ) = θ V a r ( θ ^ 3 ) = V a r ( X 1 : n ) + V a r ( X n : n ) + 2 C o v ( X 1 : n , X n : n ) = 2 n ( n + 1 ) θ 2 − ( 1 n + 1 θ ) 2 + n n + 2 θ 2 − ( n n + 1 θ ) 2 + 2 C o v ( X 1 : n , X n : n ) S i n c e t h e j o i n t d i s t r i b u t i o n o f t h e o r d e r s t a t i s t i c s o f t h e u n i f o r m d i s t r i b u t i o n i s f u i , v j ( u , v ) = n ! u i − 1 ( i − 1 ) ! ( v − u ) j − i − 1 ( j − i − 1 ) ! ( 1 − v ) n − j ( n − j ) ! C o v ( u k , v j ) = j ( n − k − 1 ) ( n − 1 ) 2 ( n + 2 ) V a r ( θ ^ 3 ) = 2 θ 2 n ( n + 2 ) + 2 n 2 θ 2 ( n + 1 ) 2 ( n + 2 ) \widehat{\theta}_3=x_{1:n}+x_{n:n}\\ F_{X_{n:n}}(x) =[F_X(x)]^n=\frac{x^n}{\theta^n},f_{X_{n:n}}(x)=n\frac{x^{n-1}}{\theta^n}\\ E[X_{n:n}]=\int xn\frac{x^{n-1}}{\theta^n}dx=\frac{n}{n+1}\theta\\ E[X_{n:n}^2] =\int x^2n\frac{x^{n-1}}{\theta^n}dx=\frac{n}{n+2}\theta^2\\ F_{X_{1:n}}(x) =1-[1-F_X(x)]^n=1-(\frac{\theta-x}{\theta})^n,f_{X_{1:n}}(x)=\frac{n (\theta-x)^{n-1}}{\theta^n}\\ E[X_{1:n}]=\int x\frac{n (\theta-x)^{n-1}}{\theta^n}dx=\frac{1}{n+1}\theta\\ E[X_{1:n}^2] =\int x^2\frac{n (\theta-x)^{n-1}}{\theta^n}dx=\frac{2}{n(n+1)}\theta^2\\ E(\widehat{\theta}_3)=E(x_{1:n})+E(x_{n:n})=\theta\\ Var(\hat{\theta}_3)=Var(X_{1:n})+Var(X_{n:n})+2Cov(X_{1:n},X_{n:n})\\ =\frac{2}{n(n+1)}\theta^2-(\frac{1}{n+1}\theta)^2 +\frac{n}{n+2}\theta^2-(\frac{n}{n+1}\theta)^2+2Cov(X_{1:n},X_{n:n})\\ Since~the~ joint~ distribution~ of ~the~ order~ statistics~ of~ the~ uniform~ distribution~is\\ f_{u_i,v_j}(u,v)=n!\frac{u^{i-1}}{(i-1)!}\frac{(v-u)^{j-i-1}}{(j-i-1)!}\frac{(1-v)^{n-j}}{(n-j)!}\\ Cov(u_k,v_j)=\frac{j(n-k-1)}{(n-1)^2(n+2)}\\ Var(\hat{\theta}_3)=\frac{2\theta^2}{n(n+2)}+\frac{2n^2\theta^2}{(n+1)^2(n+2)} θ
3=x1:n+xn:nFXn:n(x)=[FX(x)]n=θnxn,fXn:n(x)=nθnxn−1E[Xn:n]=∫xnθnxn−1dx=n+1nθE[Xn:n2]=∫x2nθnxn−1dx=n+2nθ2FX1:n(x)=1−[1−FX(x)]n=1−(θθ−x)n,fX1:n(x)=θnn(θ−x)n−1E[X1:n]=∫xθnn(θ−x)n−1dx=n+11θE[X1:n2]=∫x2θnn(θ−x)n−1dx=n(n+1)2θ2E(θ
3