1. Bayes’ Formula
P ( A ∣ B ) = P ( B ∣ A ) P ( B ) ∗ P ( A ) = P ( A ⋂ B ) P ( B ) P(A|B)=\frac{P(B|A)}{P(B)}*P(A)=\frac{P(A \bigcap B)}{P(B)} P(A∣B)=P(B)P(B∣A)∗P(A)=P(B)P(A⋂B)
2. Basic Statistics
2.1 Expected Value
E ( X ) = P ( x 1 ) x 1 + P ( x 2 ) x 2 + … + P ( x n ) x n E(X)=P(x_1)x_1 +P(x_2)x_2+\ldots+P(x_n)x_n E(X)=P(x1)x1+P(x2)x2+…+P(xn)xn
2.2 Variance
σ 2 = E [ ( X − μ ) 2 ] \sigma^2=E[(X-\mu)^2] σ2=E[(X−μ)2]
σ 2 ( a X ) = E [ ( a X − a μ ) 2 ] = a 2 σ 2 ( X ) \sigma^2(aX)=E[(aX-a\mu)^2]=a^2\sigma^2(X) σ2(aX)=E[(aX−aμ)2]=a2σ2(X)
2.3 Covariance
C o v ( X , Y ) = E [ ( X − E [ X ] ) ( Y − E [ Y ] ) ] = E [ X Y ] − E [ X ] E [ Y ] Cov(X,Y)=E[(X-E[X])(Y-E[Y])]=E[XY]-E[X]E[Y] Cov(X,Y)=E[(X−E[X])(Y−E[Y])]=E[XY]−E[X]E[Y]
The covariance of
X
X
X and itself is the variance of
X
X
X
C
o
v
(
X
,
X
)
=
E
[
(
X
−
E
[
X
]
)
(
X
−
E
[
X
]
)
]
=
σ
X
2
Cov(X,X)=E[(X-E[X])(X-E[X])]=\sigma_X^2
Cov(X,X)=E[(X−E[X])(X−E[X])]=σX2
C o v ( X , Y ) = σ X Y Cov(X,Y)=\sigma_{XY} Cov(X,Y)=σXY
If
a
a
a,
b
b
b and
c
c
c are constant, then
C
o
v
(
a
+
b
X
,
c
Y
)
=
C
o
v
(
a
,
c
Y
)
+
C
o
v
(
b
X
,
c
Y
)
=
b
∗
c
∗
C
o
v
(
X
,
Y
)
Cov(a+bX,cY) =Cov(a,cY)+Cov(bX,cY)=b*c*Cov(X,Y)
Cov(a+bX,cY)=Cov(a,cY)+Cov(bX,cY)=b∗c∗Cov(X,Y)
The relationship between covariance and Xvariance
σ
X
±
Y
2
=
σ
x
2
+
σ
y
2
±
2
C
o
v
(
X
,
Y
)
\sigma_{X \pm Y}^2=\sigma_x^2 +\sigma_y^2 \pm2Cov(X,Y)
σX±Y2=σx2+σy2±2Cov(X,Y)
σ a X ± b Y 2 = a 2 σ x 2 + b 2 σ y 2 ± 2 a b ∗ C o v ( X , Y ) \sigma_{aX \pm bY}^2=a^2\sigma_x^2 +b^2\sigma_y^2 \pm2ab*Cov(X,Y) σaX±bY2=a2σx2+b2σy2±2ab∗Cov(X,Y)
2.4 Correlation
ρ = C o v ( X , Y ) σ x σ y \rho=\frac{Cov(X,Y)}{\sigma_x\sigma_y} ρ=σxσyCov(X,Y)
Correlation has no units, ranges from − 1 -1 −1 and + 1 +1 +1.
Variance of correlated variables:
σ
X
±
Y
2
=
σ
x
2
+
σ
y
2
±
2
C
o
v
(
X
,
Y
)
=
σ
x
2
+
σ
y
2
±
2
ρ
σ
X
σ
Y
\sigma_{X \pm Y}^2=\sigma_x^2 +\sigma_y^2 \pm2Cov(X,Y)=\sigma_x^2 +\sigma_y^2 \pm2\rho\sigma_X\sigma_Y
σX±Y2=σx2+σy2±2Cov(X,Y)=σx2+σy2±2ρσXσY
2.5 Sums of Random Variables
If X and Y are any random variables
E
[
X
+
Y
]
=
E
[
X
]
+
E
[
Y
]
E[X+Y]=E[X]+E[Y]
E[X+Y]=E[X]+E[Y]
If X and Y are independent:
V
a
r
(
X
+
Y
)
=
V
a
r
(
X
)
+
V
a
r
(
Y
)
Var(X+Y)=Var(X)+Var(Y)
Var(X+Y)=Var(X)+Var(Y)
If X and Y are not independent
V
a
r
[
X
+
Y
]
=
V
a
r
(
X
)
+
V
a
r
(
Y
)
+
2
C
o
v
(
X
,
Y
)
Var[X+Y]=Var(X)+Var(Y)+2Cov(X,Y)
Var[X+Y]=Var(X)+Var(Y)+2Cov(X,Y)
2.6 Skewness&Kurtosis
S
k
e
w
n
e
s
s
=
E
(
X
−
μ
x
)
3
σ
x
3
Skewness={E(X-\mu_x)^3 \over \sigma_x^3}
Skewness=σx3E(X−μx)3
Positive Skewness : Mode < Median < Mean
Negative Skewness : Mode > Median > Mean
K
u
r
t
o
s
i
s
=
E
[
X
−
μ
X
]
4
[
E
[
X
−
μ
X
]
2
]
2
Kurtosis=\frac{E[X-\mu_X]^4}{[E[X-\mu_X]^2]^2}
Kurtosis=[E[X−μX]2]2E[X−μX]4
Excess kurtosis = sample kurtosis - 3
3. Common Probability Distribution
4. Central Limit Theorem
5. Measure of Central Tendency
6. Measurement of Dispersion
7. Sampling & Estimation
7.1 Sampling Mean
Assumed the random variables
X
i
X_i
Xi are i.i.d, and
E
[
X
i
]
=
μ
E[X_i]=\mu
E[Xi]=μ,
V
[
X
i
]
=
σ
2
V[X_i]=\sigma^2
V[Xi]=σ2
When population mean
μ
\mu
μ are not observable ,it is estimated using the sample mean estimator,
μ
^
\hat{\mu}
μ^, or
X
i
‾
\overline{X_i}
Xi in mathematical expression.
In this case,
μ
^
\hat{\mu}
μ^ is an estimator of the unknown population parameter
μ
\mu
μ.
The mean estimator is unbiased because the expected value of mean estimator is the same as the population mean.
E
[
μ
^
]
=
1
n
∗
∑
i
=
1
n
E
[
x
i
]
=
μ
E[\hat{\mu}]=\frac{1}{n}*\sum_{i=1}^nE[x_i]=\mu
E[μ^]=n1∗∑i=1nE[xi]=μ
The variance of the mean estimator decreases as the number of observations increases, and so larger samples are better to estimate population mean.
V
[
μ
^
]
=
1
n
2
∑
i
=
1
n
V
[
x
i
]
=
σ
2
n
V[\hat{\mu}]=\frac{1}{n^2}\sum_{i=1}^nV[x_i]=\frac{\sigma^2}{n}
V[μ^]=n21∑i=1nV[xi]=nσ2
7.2 Sample Variance
Similarly as
μ
^
\hat{\mu}
μ^, sample variance is estimated using the sample variance estimator, denoted by
σ
^
2
\hat{\sigma}^2
σ^2.
σ
^
2
=
1
n
∑
i
=
1
n
(
x
i
−
μ
^
)
2
\hat{\sigma}^2=\frac{1}{n}\sum_{i=1}^n(x_i-\hat{\mu})^2
σ^2=n1∑i=1n(xi−μ^)2
Unlike the sample mean estimator, sample variance estimator is biased.
E
[
σ
^
2
]
=
σ
2
−
σ
2
n
=
n
−
1
n
σ
2
E[\hat{\sigma}^2]=\sigma^2-\frac{\sigma^2}{n}=\frac{n-1}{n}\sigma^2
E[σ^2]=σ2−nσ2=nn−1σ2
The sample variance estimator, S 2 S^2 S2, is unbiased, as E [ S 2 ] = σ 2 E[S^2]=\sigma^2 E[S2]=σ2
S 2 = 1 n − 1 ∑ i = 1 n ( x i − μ ^ ) 2 = 1 n − 1 σ 2 S^2=\frac{1}{n-1}\sum_{i=1}^n(x_i-\hat{\mu})^2=\frac{1}{n-1}\sigma^2 S2=n−11∑i=1n(xi−μ^)2=n−11σ2
The expression ( n − 1 ) (n-1) (n−1) is known as the degrees of freedom.
7.3 Standard Error of Sample Mean
Standard error of the sample mean is the standard deviation of distribution of the sample mean.
Known population variance
Unknown population variance
7.4 The Central Limit Theorem (CLT)
When selecting simple random samples of size n from a population with a mean μ \mu μ and a finite variance σ 2 \sigma^2 σ2, the sampling distribution of the sample mean approaches a normal probability distribution with mean μ \mu μ and a variance σ 2 / n \sigma^2/n σ2/n equal to as the sample size becomes large ( n ≥ 30 n \geq 30 n≥30).
μ ^ n = X ‾ ∼ N ( μ , σ 2 n ) \hat{\mu}_n=\overline{X}\sim N(\mu,\frac{\sigma^2}{n}) μ^n=X∼N(μ,nσ2)
8. Hypothesis Testing
8.1 Null and Alternative Hypotheses
The null hypothesis( H 0 H_0 H0), which specifies a parameter value that is assumed to be true.
The alternative hypothesis( H 1 H_1 H1), which defines the range of values where the null should be rejected.
The test statistic, which has a know distribution when the null is true.
In most cases, test statistic follows standard normal distribution.
The size of the test, which captures the willingness to make a mistake and falsely reject a null hypothesis that is true.
The test size(
α
\alpha
α) is chosen to reflect the willingness to mistakenly reject a true null hypothesis , it is set by the tester.
The most common test size is
5
%
5\%
5%. Smaller test sizes(e.g.,
1
%
1\%
1% or even $0.1%) are used when it is especially important to avoid incorrectly rejecting a true null.
The critical value, which is a value that is compared to the test statistic to determine whether to reject the null hypothesis.
The decision rule, which combines the test statistic and critical value to determine whether to reject the null hypothesis.
The test power, which measures the probability that a false null is rejected.
8.2 Test of Mean/Means
T = μ ^ − μ 0 S / n ∼ t n − 1 T=\frac{\hat{\mu}- \mu_0}{S/\sqrt{n}} \sim t_n-1 T=S/nμ^−μ0∼tn−1
- n − 1 n-1 n−1 refers to the degree of freedom
- when n n n is small (i.e. less than 30), the Student’s t t t has been documented to provide a better approximation than the normal.
T = μ ^ − μ o σ / n ∼ N ( 0 , 1 ) T=\frac{\hat{\mu}- \mu_o}{\sigma/\sqrt{n}}\sim N(0,1) T=σ/nμ^−μo∼N(0,1)
- Consider a test of the null hypothesis about a mean: H 0 : μ = μ 0 H_0:\mu=\mu_0 H0:μ=μ0
- When the true value of the mean ( μ \mu μ) is equal to the value assumed by the null ( μ 0 \mu_0 μ0), then the asymptotic distribution leads to the test statistic
- The test statistic T T T (also known as the t-statistic) is asymptotically standard normally distributed according to CLT.
T = μ ^ Z σ 2 ^ / n = μ ^ X − μ ^ Y σ ^ X 2 + σ ^ Y 2 − 2 σ ^ X Y n T=\frac{\hat{\mu}_Z}{\hat{\sigma^2}/\sqrt{n}}=\frac{\hat{\mu}_X- \hat{\mu}_Y}{\sqrt{\frac{\hat{\sigma}_X^2+\hat{\sigma}_Y^2-2\hat{\sigma}_{XY}}{n}}} T=σ2^/nμ^Z=nσ^X2+σ^Y2−2σ^XYμ^X−μ^Y
- Testing whether the means of two series are equal, H 0 : μ X = μ Y H_0: \mu_X=\mu_Y H0:μX=μY
- If the null hypothesis is true, the E [ Z i ] = E [ X i ] − E [ Y i ] = μ X − μ Y = 0 E[Z_i]=E[X_i]-E[Y_i]=\mu_X-\mu_Y=0 E[Zi]=E[Xi]−E[Yi]=μX−μY=0
- When X i X_i Xi and Y i Y_i Yi are both iid and mutually independent, the test statistic for testing that the means are equal is :
T = μ ^ X − μ ^ Y σ ^ X 2 n X + σ ^ Y 2 n Y T=\frac{\hat{\mu}_X- \hat{\mu}_Y}{\sqrt{\frac{\hat{\sigma}_X^2}{n_X} +\frac{\hat{\sigma}_Y^2}{n_Y} }} T=nXσ^X2+nYσ^Y2μ^X−μ^Y
8.3 Difference Between One- and Two-Tailed/Sided Tests
One tailed test: test whether value is greater than or less than a given number.
H
0
:
μ
≥
0
H
1
:
μ
<
0
H_0:\mu\geq0 \qquad H_1:\mu<0
H0:μ≥0H1:μ<0
H
0
:
μ
≤
0
H
1
:
μ
>
0
H_0:\mu\leq0 \qquad H_1:\mu>0
H0:μ≤0H1:μ>0
When testing against one sided test, if
α
=
10
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
+
1.28
o
r
−
1.28
\alpha = 10\% \qquad Critical\; Value=+1.28\;or\; -1.28
α=10%CriticalValue=+1.28or−1.28
α
=
5
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
+
1.645
o
r
−
1.645
\alpha = 5\% \qquad Critical\; Value=+1.645 \;or\; -1.645
α=5%CriticalValue=+1.645or−1.645
α
=
1
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
+
2.326
o
r
−
2.326
\alpha = 1\% \qquad Critical\; Value=+2.326 \;or\; -2.326
α=1%CriticalValue=+2.326or−2.326
Two tailed test: test whether value is equal to a given number.
H
0
:
μ
=
0
H
1
:
μ
≠
0
H_0:\mu=0 \qquad H_1:\mu\neq0
H0:μ=0H1:μ=0
When testing against two sided test, if
α
=
10
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
±
1.645
\alpha = 10\% \qquad Critical\; Value=\pm1.645
α=10%CriticalValue=±1.645
α
=
5
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
±
1.96
\alpha = 5\% \qquad Critical\; Value=\pm1.96
α=5%CriticalValue=±1.96
α
=
1
%
C
r
i
t
i
c
a
l
V
a
l
u
e
=
±
2.57
\alpha = 1\% \qquad Critical\; Value=\pm2.57
α=1%CriticalValue=±2.57
8.4 Type I & Type II Errors
Desicion | IF | |
H0 is ture | Ho is false | |
Fail to reject H0 | Correct (1-α) | Type II error (β) |
Reject H0 | Type I error (α) Significant Level | Correct (1-β) Power of Test |
A Type I error occurs when the null is true, but the null is rejected.
The probability of Type I error is denoted by Greek letter
α
\alpha
α. which also referred to the test size/significant level.
A Type II error occurs when the alternative is true, but the null is not rejected.
The probability of Type II error is denoted by Greek letter
β
\beta
β
In practice,
β
\beta
β should be small so that the power of test, defined as
1
−
β
1-\beta
1−β, is high.
8.5 Confidence Interval Approach
A confidence intervals is a range of parameters that complements the rejection region.
One-sided test
C
I
=
[
M
e
a
n
E
s
i
t
m
a
t
e
±
(
c
r
i
t
i
c
a
l
v
a
l
u
e
)
×
s
t
a
n
d
a
r
d
e
r
r
o
r
]
CI=[Mean\;Esitmate \pm(critical\;value)\times standard\;error ]
CI=[MeanEsitmate±(criticalvalue)×standarderror]
Two-sided test
If rejection region is on the left.
C
I
=
[
M
e
a
n
E
s
i
t
m
a
t
e
−
(
c
r
i
t
i
c
a
l
v
a
l
u
e
)
×
s
t
a
n
d
a
r
d
e
r
r
o
r
,
+
∞
]
CI=[Mean\;Esitmate - (critical\;value)\times standard\;error,+\infty]
CI=[MeanEsitmate−(criticalvalue)×standarderror,+∞]
If rejection region is on the right.
C
I
=
[
−
∞
,
M
e
a
n
E
s
i
t
m
a
t
e
+
(
c
r
i
t
i
c
a
l
v
a
l
u
e
)
×
s
t
a
n
d
a
r
d
e
r
r
o
r
]
CI=[-\infty,Mean\;Esitmate+(critical\;value)\times standard\;error]
CI=[−∞,MeanEsitmate+(criticalvalue)×standarderror]
9. Regression
9.1 Simple Linear Regression
Y
i
=
b
0
+
b
1
X
i
+
ε
i
Y_i=b_0+b_1X_i+\varepsilon_i
Yi=b0+b1Xi+εi
Y
i
Y_i
Yi dependent or explained variable
X
i
X_i
Xi independent or explanatory variable
b
0
b_0
b0 intercept coefficient
The intercept term of b 0 b_0 b0can be interpreted to mean that the independent variable is zero, the dependent variable is b 0 b_0 b0.
b 0 ^ = Y ‾ − b ^ 1 X ‾ \hat{b_0}=\overline{Y}-\hat{b}_1\overline{X} b0^=Y−b^1X
An estimated slope coefficient of b 1 b_1 b1 would indicate that the dependent variable will change b 1 b_1 b1 units for every 1 unit change in the independent variable.
b ^ 1 = C o v ( X , Y ) σ 2 ( X ) = ρ ^ x y ∗ σ y σ x \hat{b}_1=\frac{Cov(X,Y)}{\sigma^2(X)}=\hat{\rho}_{xy}*\frac{\sigma_y}{\sigma_x} b^1=σ2(X)Cov(X,Y)=ρ^xy∗σxσy
ε
i
\varepsilon_i
εi error term/shock
The error term is the portion that can’t be explained by independent model.
The error term is assumed to have mean 0 so that
E
[
Y
]
=
E
[
b
0
+
b
1
X
+
ε
]
=
b
0
+
b
1
E
[
X
]
E[Y]=E[b_0+b_1X+\varepsilon]=b_0+b_1E[X]
E[Y]=E[b0+b1X+ε]=b0+b1E[X]
Ordinary Least Squares(OLS)
The OLS estimation is the process of estimating the population parameter
b
i
b_i
bi using the corresponding
b
i
b_i
bi value, which minimizes the square residual(i.e. the error terms),
The OLS sample coefficients are those that:
m i n i m i z e ∑ ε i 2 = ∑ [ Y i − ( b 0 ^ + b 1 ^ ∗ X i ) ] 2 minimize\sum\varepsilon_i^2=\sum[Y_i-(\hat{b_0}+\hat{b_1}*X_i)]^2 minimize∑εi2=∑[Yi−(b0^+b1^∗Xi)]2
9.2 Multiple Linear Regression
The multiple linear regression model
Y
i
=
b
0
+
b
1
X
1
i
+
b
2
X
2
i
+
⋯
+
b
k
X
k
i
+
ε
i
Y_i=b_0+b_1X_{1i}+b_2X_{2i}+\cdots+b_kX_{ki}+\varepsilon_i
Yi=b0+b1X1i+b2X2i+⋯+bkXki+εi
The predicted value of the dependent variable
Y
^
=
b
0
^
+
b
1
^
X
1
^
+
b
2
^
X
2
^
+
⋯
+
b
k
^
X
k
^
\hat{Y}=\hat{b_0}+\hat{b_1}\hat{X_1}+\hat{b_2}\hat{X_2}+\cdots+\hat{b_k}\hat{X_k}
Y^=b0^+b1^X1^+b2^X2^+⋯+bk^Xk^
9.3 Total Sum of Squares
Total Sum of Squares
T
S
S
=
∑
(
Y
i
−
Y
‾
)
2
TSS=\sum(Y_i-\overline{Y})^2
TSS=∑(Yi−Y)2
Explained Sum of Squares
E
S
S
=
∑
(
Y
i
^
−
Y
‾
)
2
ESS=\sum(\hat{Y_i}-\overline{Y})^2
ESS=∑(Yi^−Y)2
Residual Sum of Squares
R
S
S
=
∑
(
Y
i
−
Y
i
^
)
2
RSS=\sum(Y_i-\hat{Y_i})^2
RSS=∑(Yi−Yi^)2
∑ ( Y i − Y ‾ ) 2 = ∑ ( Y i ^ − Y ‾ ) 2 + ∑ ( Y i − Y i ^ ) 2 \sum(Y_i-\overline{Y})^2=\sum(\hat{Y_i}-\overline{Y})^2+\sum(Y_i-\hat{Y_i})^2 ∑(Yi−Y)2=∑(Yi^−Y)2+∑(Yi−Yi^)2
9.4 Measures of Fitness
R 2 = E S S T S S = 1 − R S S T S S R^2=\frac{ESS}{TSS}=1-\frac{RSS}{TSS} R2=TSSESS=1−TSSRSS
The coefficient of determination: A more intuitive measure of the “goodness of fit” of the regression. It is interpreted as a percentage of variation in the dependent variable explained by the independent variable. Its limits are 0 ≤ R 2 ≤ 1 0\leq R^2 \leq 1 0≤R2≤1.
r
2
=
R
2
→
r
=
±
R
2
r^2=R^2\to r=\pm\sqrt{R^2}
r2=R2→r=±R2
Notes that in a simple two-variable regression, the square root of
R
2
R^2
R2 is the correlation coefficient (
r
r
r) between
X
i
X_i
Xi and
Y
i
Y_i
Yi
Adjusted
R
2
=
1
−
R
S
S
/
n
−
k
−
1
T
S
S
/
n
−
1
R^2 = 1-\frac{RSS/n-k-1}{TSS/n-1}
R2=1−TSS/n−1RSS/n−k−1
Adjusted
R
2
=
1
−
n
−
1
n
−
k
−
1
∗
(
1
−
R
2
)
R^2 = 1-\frac{n-1}{n-k-1}*(1-R^2)
R2=1−n−k−1n−1∗(1−R2)
Adding a new variable to the model always increases the
R
2
R^2
R2
The adjusted
R
2
R^2
R2 is a modified version of the
R
2
R^2
R2 that does not necessarily increase with a new independent variable is added.
Adjusted
R
2
≤
R
2
R^2 \leq R^2
R2≤R2.
Adjusted
R
2
R^2
R2 may be less than zero.
9.5 ANOVA Table
df | SS | MSS | |
---|---|---|---|
Explained | k | ESS | ESS/k |
Residual | n-k-1 | RSS | RSS/(n-k-1) |
Total | n-1 | TSS |
9.5 Joint Hypothesis Testing
F = E S S / k R S S / n − k − 1 F =\frac{ESS/k}{RSS/n-k-1} F=RSS/n−k−1ESS/k
An F-test is used to test whether at least one slope coefficient is significantly different from zero.
H
0
:
b
1
=
b
2
=
b
3
=
⋯
=
b
k
=
0
H_0:b_1=b_2=b_3=\cdots=b_k=0
H0:b1=b2=b3=⋯=bk=0
H
1
:
a
t
l
e
a
s
t
o
n
e
b
j
≠
0
(
j
=
1
t
o
k
)
H_1: at\; least\; one \;b_j\neq0 \;(j=1\;to\; k)
H1:atleastonebj=0(j=1tok)
The F-test assesses the effectiveness of the model as a whole in explaining the dependent variable.
10. Stationary Time Series
11. Non-Stationary Time Series
13. Measuring Returns, Volatility and Correlation
13.1 Simple Return
The usual return on an asset bought at time t − 1 t-1 t−1 and sold at time t t t can be expressed as percentage change in the market variable between the end of day t − 1 t-1 t−1 and the end of day t t t.
R t = P t − P t − 1 P t − 1 R_t=\frac{P_t-P_{t-1}}{P_{t-1}} Rt=Pt−1Pt−Pt−1
The return of an asset over multiple periods is the product of the simple returns in each period:
1 + R T = ∏ t = 1 n ( 1 + R t ) 1+R_T=\prod_{t=1}^n(1+R_t) 1+RT=∏t=1n(1+Rt)
13.2 Continuously Compounded Returns
Continuously compound returns is also known as log returns. These are computed as the difference of the natural logarithm of the price.
r t = ln P t − ln P t − 1 r_t=\ln P_t-\ln P_{t-1} rt=lnPt−lnPt−1
The relationship between simple and log return
1 + R t = e r t 1+R_t=e^{r_t} 1+Rt=ert
The main advantage of log returns is that the total return over multiple periods is just sum of the single period log return. However, the accuracy of log return approximation is poor when the simple return is large.
r T = ∑ t = 1 T r t r_T=\sum^T_{t=1}r_t rT=∑t=1Trt
13.3 Measuring Volatility
A volatility of a financial asset is usually measured by the standard deviation of its returns.
The measure of volatility scales with the square-root of the holding period. When the volatility is measured daily, it is common to convert daily volatility to annualized volatility by scaling by
252
\sqrt {252}
252.
σ a n n u a l = 252 ∗ σ d a i l y \sigma_{annual}=\sqrt {252} *\sigma_{daily} σannual=252∗σdaily
The variance (also called the variance rate) of returns is estimated using the standard estimator:
σ ^ 2 = 1 T ∑ t = 1 T ( r t − μ ^ ) 2 \hat{\sigma}^2=\frac{1}{T}\sum^T_{t=1}(r_t-\hat{\mu})^2 σ^2=T1∑t=1T(rt−μ^)2
13.4 Two methods to test normality of a distribution
The Jarque-Bera Test
It is used to formally test whether the sample skweness and kurtosis are compatible with an assumption that the returns are normally distributied.
H 0 : S k e w b e s s = 0 a n d K u r t o s i s = 3 H_0: \quad Skewbess=0 \quad and \quad Kurtosis =3 H0:Skewbess=0andKurtosis=3
The test statistic is
J
B
=
(
T
−
1
)
(
s
^
2
6
+
(
K
^
−
3
)
2
24
)
JB=(T-1)(\frac{{\hat{s}}^2}{6}+\frac{(\hat{K}-3)^2}{24})
JB=(T−1)(6s^2+24(K^−3)2)
Power Laws
Normal random variables have thin tails so that the probability of return larger than
K
σ
K\sigma
Kσ declines rapidly as
K
K
K increase, whereas many other distributions have tails that decline less quickly for large deviations.