2.5 Regression

2.5 Regression

Question 1

Which of the following items does not apply to the sum of squared residuals (SSR) from an ordinary least square (OLS) regression?

A. SSR is equal to ∑ e i 2 \sum e_i^2 ei2
B. SSR is equal to ∑ [ Y i − ( b 0 + b 1 X i ) ] \sum[Y_i-(b_0+b_1X_i)] [Yi(b0+b1Xi)]
C. When using OLS, SSR is minimized
D. SSR can indicate how well the regression model explains the dependent variable

Answer: B
Total sum of squares(TSS): sum of squared deviations of Y i Y_i Yi from its average.
T S S = ∑ i = 1 n ( Y i − Y ‾ ) 2 TSS=\sum^n_{i=1}(Y_i-\overline{Y})^2 TSS=i=1n(YiY)2
Explained sum of squares(ESS): sum of squared deviations of the predicted values Y i ^ \hat{Y_i} Yi^.
E S S = ∑ i = 1 n ( Y i ^ − Y ‾ ) 2 ESS=\sum^n_{i=1}(\hat{Y_i}-\overline{Y})^2 ESS=i=1n(Yi^Y)2
Residual Sum of squares(RSS): sum of squared residuals or the residual of squares.
R S S = ∑ i = 1 n ( Y i − Y i ^ ) 2 = ∑ i = 1 n e i 2 = ∑ i = 1 n [ Y i − ( b 0 + b 1 X i ) ] 2 RSS=\sum^n_{i=1}(Y_i-\hat{Y_i})^2=\sum^n_{i=1}e_i^2=\sum^n_{i=1}[Y_i-(b_0+b_1X_i)]^2 RSS=i=1n(YiYi^)2=i=1nei2=i=1n[Yi(b0+b1Xi)]2

Ordinary least squares(OLS) estimation is a process that minimize the square deviations between the realizations of the dependent variable Y Y Y and their expected values given the respective realization of X X X. In turn, these estimators minimize the residual sum of squares.


Question 2

An analyst is using a statistical package to perform a linear regression between the risk and return from securities in an emerging market country. The original data and intermediate statistics are shown on the right. The value of coefficient of determination for this regression is closest to:

Risk% ( X i X_i Xi) 1.1 1.1 1.1 1.5 1.5 1.5 2.2 2.2 2.2 3.6 3.6 3.6 4.3 4.3 4.3 5.1 5.1 5.1 6.7 6.7 6.7
Return% ( Y i Y_i Yi) 3.2 3.2 3.2 3.5 3.5 3.5 4.1 4.1 4.1 4.5 4.5 4.5 4.8 4.8 4.8 5.1 5.1 5.1 5.2 5.2 5.2

A. 0.043 0.043 0.043
B. 0.084 0.084 0.084
C. 0.916 0.916 0.916
D. 0.957 0.957 0.957

Answer: C: Coefficient determination

μ x = 3.5 \mu_x=3.5 μx=3.5, μ y = 4.3 \mu_y=4.3 μy=4.3

∑ i = 1 7 ( x i − μ x ) 2 = 24.9 \sum^7_{i=1} (x_i − \mu_x)^2 = 24.9 i=17(xiμx)2=24.9, ∑ j = 1 7 ( y j − μ y ) 2 = 3.63 \sum^7_{j=1} (y_j− \mu_y)^2 = 3.63 j=17(yjμy)2=3.63, ∑ i = j 1 7 ( x i − μ x ) ( y j − μ y ) = 9.1 \sum^7_{i=j1} (x_i − \mu_x)(y_j− \mu_y) = 9.1 i=j17(xiμx)(yjμy)=9.1

R 2 = ρ 2 = C o v ( X , Y ) 2 σ x 2 σ y 2 = 0.916 R^2 =\rho^2 = \cfrac{Cov(X,Y)^2}{\sigma_x^2\sigma_y^2}=0.916 R2=ρ2=σx2σy2Cov(X,Y)2=0.916

  • For simple linear regression, R 2 R^2 R2 is equal to the correlation coefficient.
  • R 2 R^2 R2 measures the percentage of the variation in the data that can be explained by the model.
  • Because OLS estimates parameters by finding the values that minimize the RSS, the OLS estimator also maximizes R 2 R^2 R2.

Question 3

PE2018Q17 / PE2019Q17 / PE2020Q17 / PE2021Q17 / PE2022PSQ16 / PE2022Q17
The proper selection of factors to include in an ordinary least squares estimation is critical to the accuracy of the result. When does omitted variable bias occur?

A. Omitted variable bias occurs when the omitted variable is correlated with the included regressor and is a determinant of the dependent variable.
B. Omitted variable bias occurs when the omitted variable is correlated with the included regressor but is not a determinant of the dependent variable.
C. Omitted variable bias occurs when the omitted variable is independent of the included regressor and is a determinant of the dependent variable,
D. Omitted variable bias occurs when the omitted variable is independent of the included regressor but is not a determinant of the dependent variable.

Answer: A
Learning Objective: Describe the consequences of excluding a relevant explanatory variable from a model and contrast those with the consequences of including an irrelevant regressor.

Omitted variable bias occurs when a model improperly omits one or more variables that are critical determinants of the dependent variable and are correlated with one or more of the other included independent variables. Omitted variable bias results in an over- or under-estimation of the regression parameters.


Question 4

Using standard Monte Carlo methods, an analyst prices a European call options using 1 , 000 1,000 1,000 simulations of underlying stock price. The option price is estimated at USD 1.00 1.00 1.00 with a standard error USD 0.40 0.40 0.40. All other things kept constant. If the analyst increases the number of simulations to 3 , 000 3,000 3,000, the resulting standard error is likely to be:

A. 0.13 0.13 0.13
B. 0.21 0.21 0.21
C. 0.23 0.23 0.23
D. 0.69 0.69 0.69

Answer: C
S E = μ / n SE=\mu/\sqrt{n} SE=μ/n

  • The standard deviation of the mean estimator (or any other estimator) is known as a standard error.
  • Standard errors are important in hypothesis testing and when performing Monte Carlo simulations. In both of these applications, the standard error provides an indication of the accuracy of the estimate.

Question 5

Which of the following is most likely in the case of high multicollinearity?

A. Low F ratio and insignificant partial slope coefficients
B. High F ratio and insignificant partial slope coefficients
C. Low F ratio and significant partial slope coefficients
D. High F ratio and significant partial slope coefficients

Answer: B
Classic symptom of high multicollinearity is high R2 (corresponds to high F ratio) but insignificant partial slope coefficients.


Question 6

Which of the following statements is correct regarding the use of the F-test and the F-statistic?

I. For simple linear regression, the F-test tests the same hypothesis as the t-test.
II. The F-statistic is used to find which items in a set of independent variables explain a significant portion of the variation of the dependent variable.

A. I only
B. II only
C. Both I and II
D. Neither I nor II

Answer: A
The simple linear regression F-test tests the same hypothesis as the t-test because there is only one independent variable.

An F-test is used to test whether at least one slope coefficient
is significantly different from zero.

The F-statistic is used to tell you if at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable. It tests the independent variables as a group, and thus won’t tell you which variable has significant explanatory power.


Question 7

Greg Barns, FRM, and Jill Tillman, FRM, are discussing the hypothesis they wish to test with respect to the model represented by Y i = B 0 + B 1 X i + ε i Y_i = B_0 + B_1X_i + ε_i Yi=B0+B1Xi+εi. They wish to use the standard statistical methodology in their test. Barns thinks an appropriate hypothesis would be that B 1 = 0 B_1 = 0 B1=0 with the goal of proving it to be true. Tillman thinks an appropriate hypothesis to test is B 1 = 1 B_1 = 1 B1=1 with the goal of rejecting it. With respect to these hypotheses:

A. The hypothesis of neither researcher is appropriate.
B. The hypothesis of Barns is appropriate but not that of Tillman.
C. The hypothesis of Tillman is appropriate but not that of Barns.
D. More information is required before a hypothesis can be set up.

Answer: C
The usual approach is to specify a hypothesis that the researcher wishes to disprove.


Question 8

Which one of the following statements about the correlation coefficient is false?

A. It always ranges from − 1 -1 1 to + 1 +1 +1.
B. A correlation coefficient of zero means that two random variables are independent.
C. It is a measure of linear relationship between two random variables.
D. It can be calculated by scaling the covariance between two random variables.

Answer: B
Correlation describes the linear relationship between two variables. While we would expect to find a correlation of zero for independent variables, finding a correlation of zero does not mean that two variables are independent.


Question 9

The correlation coefficient for two dependent random variables is equal to:

A. the product of the standard deviations for the two random variables divided by the covariance.
B. the covariance between the random variables divided by the product of the variances.
C. the absolute value of the difference between the means of the two variables divided by the product of the variances.
D. the covariance between the random variables divided by the product of the standard deviations

Answer: D
ρ = C o v ( x , y ) σ x σ y \rho=\cfrac{Cov(x,y)}{\sigma_x\sigma_y} ρ=σxσyCov(x,y)


Question 10

Consider two shock, A and B. Assume their annual returns are jointly normally distributed, the marginal distribution of each stock has mean 2 % 2\% 2% and standard deviation 20 % 20\% 20%, and the correlation is 0.9 0.9 0.9. What is the expected annual return of stock A if the annual return of stock B is 3 % 3\% 3%.

A. 2 % 2\% 2%
B. 2.9 % 2.9\% 2.9%
C. 4.7 % 4.7\% 4.7%
D. 1.1 % 1.1\% 1.1%

Answer: B
The information in this question can be used to construct a regression model of A on B.

We have R A = b 0 + b 1 R B + ϵ R_A=b_0+b_1R_B + \epsilon RA=b0+b1RB+ϵ

b 1 = C o v ( A , B ) σ A 2 = ρ σ A σ B σ A 2 = 0.9 b_1=\cfrac{Cov(A,B)}{\sigma_A^2}=\cfrac{\rho\sigma_A \sigma_B}{\sigma_A^2}=0.9 b1=σA2Cov(A,B)=σA2ρσAσB=0.9, b 0 = R A ‾ − b 1 R B ‾ = 0.002 b_0=\overline{R_A}-b_1\overline{R_B}=0.002 b0=RAb1RB=0.002


Question 11

PE2018Q8 / PE2019Q8
An analyst is trying to get some insight into the relationship between the return on stock LMD and the return on the S&P 500 index. Using historical data she estimates the following:

Annual mean return for LMD is 11 % 11\% 11%
Annual mean return for S&P 500 index is 7 % 7\% 7%
Annual volatility for S&P 500 index is 18 % 18\% 18%
Covariance between the returns of LMD and S&P 500 index is 6 % 6\% 6%

Assuming she uses the same data to estimate the regression model given by:

R LMD,t = α + β × R SP500,t + ϵ t R_{\text{LMD,t}}= \alpha+\beta\times R_{\text{SP500,t}}+\epsilon_t RLMD,t=α+β×RSP500,t+ϵt

Using the ordinary least squares technique, which of the following models will the analyst obtain?

A. R LMD,t = − 0.02 + 0.54 × R SP500,t + ϵ R_{\text{LMD,t}}= -0.02+0.54\times R_{\text{SP500,t}}+\epsilon RLMD,t=0.02+0.54×RSP500,t+ϵ
B. R LMD,t = − 0.02 + 1.85 × R SP500,t + ϵ R_{\text{LMD,t}}= -0.02+1.85 \times R_{\text{SP500,t}}+\epsilon RLMD,t=0.02+1.85×RSP500,t+ϵ
C. R LMD,t = 0.04 + 0.54 × R SP500,t + ϵ R_{\text{LMD,t}}= 0.04+ 0.54 \times R_{\text{SP500,t}}+\epsilon RLMD,t=0.04+0.54×RSP500,t+ϵ
D. R LMD,t = 0.04 + 1.85 × R SP500,t + ϵ R_{\text{LMD,t}}= 0.04+1.85 \times R_{\text{SP500,t}}+\epsilon RLMD,t=0.04+1.85×RSP500,t+ϵ

Answer: B
Learning Objective: Explain how regression analysis in econometrics measures the relationship between dependent and independent variables.

The regression coefficients for a model specified by Y = b X + a + ϵ Y = bX + a+ \epsilon Y=bX+a+ϵ are obtained using the formulas:

b = C o v ( X , Y ) / σ X 2 b=Cov(X,Y) / \sigma_X^2 b=Cov(X,Y)/σX2, a = E ( Y ) − b × E ( X ) a = E(Y) - b\times E(X) a=E(Y)b×E(X)

Then: b = 0.06 / ( 0.18 ) 2 = 1.85 b = 0.06 / (0.18)^2 = 1.85 b=0.06/(0.18)2=1.85, a = 0.11 − ( 1.85 × 0.07 ) = − 0.02 a = 0.11-(1.85\times0.07) = -0.02 a=0.11(1.85×0.07)=0.02


Question 12

Which of the following is assumed in the multiple least squares regression model?

A. The dependent variable is stationary.
B. The independent variables are not perfectly multicollinear.
C. The error terms are heteroskedastic.
D. The independent variables are homoskedastic.

Answer:B
Extending the model to multiple regressors requires one additional assumption, along with some modifications, to the five assumptions in simple linear regression.

  • The explanatory variables are not perfectly linearly dependent.
  • All variables must have positive variances so that σ X j 2 > 0 \sigma^2_{X_j}>0 σXj2>0 for j = 1 , 2 , ⋯   , k j=1,2,\cdots,k j=1,2,,k.
  • The random variables ( Y i , X 1 i , X 2 i , ⋯   , X K i Y_i,X_{1i},X_{2i},\cdots, X_{Ki} Yi,X1i,X2i,,XKi) are assumed to be iid.
  • The error is assumed to have mean zero conditional on the explanatory variables (i.e. E [ ϵ i ∣ X 1 i , X 2 i , ⋯   , X K i = 0 E[\epsilon_i| X_{1i},X_{2i},\cdots, X_{Ki} = 0 E[ϵiX1i,X2i,,XKi=0).
  • The constant variance assumption is similarly extended to hold for all explanatory variables (i.e. E [ ϵ i 2 ∣ X 1 i , X 2 i , ⋯   , X K i = σ 2 E[\epsilon_i^2| X_{1i},X_{2i},\cdots,X_{Ki} = \sigma^2 E[ϵi2X1i,X2i,,XKi=σ2).
  • No outliers.

Question 13

Which of the following statements about the ordinary least squares regression model (or simple regression model) with one independent variable are correct?

I. In the ordinary least squares (OLS) model, the random error term is assumed to have zero mean and constant variance.
II. In the OLS model, the variance of the independent variable is assumed to be positively correlated with the variance of the error term.
III. In the OLS model, it is assumed that the correlation between the dependent variable and the random error term is zero.
IV. In the OLS model, the variance of the dependent variable is assumed to be constant.

A. I, II, III and IV
B. Il and IV only
C. I and IV only
D. I, II, and Ill only


Question 14

PE2018Q42 / PE2019Q42 / PE2020Q42 / PE2021Q42 / P2022Q42
A risk manager performs an ordinary least squares (OLS) regression to estimate the sensitivity of a stock’s return to the return on the S&P 500. This OLS procedure is designed to:

A. Minimize the square of the sum of differences between the actual and estimated S&P 500 returns.
B. Minimize the square of the sum of differences between the actual and estimated stock returns.
C. Minimize the sum of differences between the actual and estimated squared S&P 500 returns.
D. Minimize the sum of squared differences between the actual and estimated stock returns.

Answer: D
Learning Objective: Define an ordinary least squares (OLS) regression and calculate the intercept and slope of the regression.

The OLS procedure is a method for estimating the unknown parameters in a linear regression model.

The method minimizes the sum of squared differences between the actual, observed, returns and the returns estimated by the linear approximation. The smaller the sum of the squared differences between observed and estimated values, the better the estimated regression line fits the observed data points.


Question 15

PE2018Q44 / PE2019Q44
Using data from a pool of mortgage borrowers, a credit risk analyst performed an ordinary least squares regression of annual savings (in GBP) against annual household income (in GBP) and obtained the following relationship:

Annual Savings = 0.24 × Household Income − 25.66 , R 2 = 0.50 \text{Annual Savings} = 0.24\times\text{Household Income} - 25.66, R^2= 0.50 Annual Savings=0.24×Household Income25.66,R2=0.50

Assuming that all coefficients are statistically significant, which interpretation of this result is correct?

A. For this sample data, the average error term is GBP − 25.66 -25.66 25.66.
B. For a household with no income, annual savings is GBP 0 0 0.
C. For an increase of GBP 1,000 in income, expected annual savings will increase by GBP 240 240 240.
D. For a decrease of GBP 2,000 in income, expected annual savings will increase by GBP 480 480 480.

Answer: C
Learning Objective: Interpret a population regression function, regression coefficients, parameters, slope, intercept, and the error term.

An estimated coefficient of 0.24 0.24 0.24 from a linear regression indicates a positive relationship between income and savings, and more specifically means that a one unit increase in the independent variable (household income) implies a 0.24 0.24 0.24 unit increase in the dependent variable (annual savings). Given the equation provided, a household with no income would be expected to have negative annual savings of GBP 25.66 25.66 25.66. The error term mean is assumed to be equal to 0 0 0.


Question 16

You are conducting an ordinary least squares regression of the returns on stocks Y Y Y and X X X as Y = a + b x X + ϵ Y = a + bxX + \epsilon Y=a+bxX+ϵ based on the past three year’s daily adjusted closing price data. Prior to conducting the regression, you calculated the following information from the data:

Sample covariance is 0.000181 0.000181 0.000181
Sample Variance of Stock X X X is 0.000308 0.000308 0.000308
Sample Variance of Stock Y Y Y is 0.000525 0.000525 0.000525
Sample mean return of stock X X X is − 0.03 % -0.03\% 0.03%
Sample mean return of stock Y Y Y is 0.03 % 0.03\% 0.03%

What is the slope of the resulting regression line?

A. 0.35 0.35 0.35
B. 0.45 0.45 0.45
C. 0.59 0.59 0.59
D. 0.77 0.77 0.77


Question 17

Paul Graham, FRM is analyzing the sales growth of a baby product launched three years ago by a regional company. He assesses that three factors contribute heavily towards the growth and comes up with the following results:

Y = b + 1.5 × X 1 + 1.2 × X 2 + 3 × X 3 Y=b+1.5 \times X_1+ 1.2 \times X_2 + 3\times X_3 Y=b+1.5×X1+1.2×X2+3×X3
Sum of Squared Regression, S S R = 869.76 SSR = 869.76 SSR=869.76
Sum of Squared Errors, S E E = 22.12 SEE = 22.12 SEE=22.12

Determine what proportion of sales growth is explained by the regression results.

A. 0.36 0.36 0.36
B. 0.98 0.98 0.98
C. 0.64 0.64 0.64
D. 0.55 0.55 0.55

Answer: B
Sum of squared regression also means explained sum of squares.

R 2 = E S S / T S S = 1 − S S R / T S S R^2=ESS/TSS=1-SSR/TSS R2=ESS/TSS=1SSR/TSS


Question 18

A regression of a stock’s return (in percent) on an industry index’s return (in percent) provides the following results:

CoefficientStandard error
Intercept 2.1 2.1 2.1 2.01 2.01 2.01
Industry Index 1.9 1.9 1.9 0.31 0.31 0.31
Degrees of FreedomSS
Explained 1 1 1 92.648 92.648 92.648
Residual 3 3 3 24.512 24.512 24.512
Total 4 4 4 117.160 117.160 117.160

Which of the following statements regarding the regression is correct?

I. The correlation coefficient between the X and Y variables is 0.889 0.889 0.889.
II. The industry index coefficient is significant at the 99 % 99\% 99% confidence interval.
III. If the return on the industry index is 4 % 4\% 4%, the stock’s expected return is 10.3 % 10.3\% 10.3%.
IV. The variability of industry returns explains 21 % 21\% 21% of the variation of company returns.

A. IIl only
B. I and II only
C. Il and IV only
D. I, II, and IV

Answer: B
The R 2 R^2 R2 of the regression is calculated as E S S / T S S = ( 92.648 / 117.160 ) = 0.79 ESS/TSS = (92.648/117.160) = 0.79 ESS/TSS=(92.648/117.160)=0.79, which means that the variation in industry returns explains 79 % 79\% 79% of the variation in the stock return. By taking the square root of R 2 R^2 R2, we can calculate that the correlation coefficient ρ = 0.889 \rho= 0.889 ρ=0.889.

The t-statistic for the industry return coefficient is 1.91 / 0.31 = 6.13 1.91/0.31 = 6.13 1.91/0.31=6.13, which is sufficiently large enough for the coefficient to be significant at the 99 % 99\% 99% confidence interval.

Since we have the regression coefficient and intercept, we know that the regression equation is R s t o c k = 1.9 × X + 2.1 R_{stock} = 1.9 \times X + 2.1 Rstock=1.9×X+2.1. Plugging in a value of 4% for the industry return, we get a stock return of 1.9 × 4 % + 2.1 = 9.7 % 1.9\times 4\% + 2.1 = 9.7\% 1.9×4%+2.1=9.7%.


Question 19

An analyst is given the data in the following table for a regression of the annual sales for Company XYZ, a maker of paper products, on paper product industry sales.

ParametersCoefficientStandard Error of the Coefficient
Intercept − 94.88 -94.88 94.88 32.97 32.97 32.97
Slope (industry sales) 0.2796 0.2796 0.2796 0.0363 0.0363 0.0363

The correlation between company and industry sales is 0.9757. Which of the following is closest to the value and reports the most likely interpretation of the R 2 R^2 R2?

A. 0.048 0.048 0.048, indicating that the variability of industry sales explains about 4.8 % 4.8\% 4.8% of the variability of company sales.
B. 0.048 0.048 0.048, indicating that the variability of company sales explains about 4.8 % 4.8\% 4.8% of the variability of industry sales.
C. 0.952 0.952 0.952, indicating that the variability of industry sales explains about 95.2 % 95.2\% 95.2% of the variability of company sales.
D. 0.952 0.952 0.952, indicating that the variability of company sales explains about 95.2 % 95.2\% 95.2% of the variability of industry sales.


Question 20

PE2018Q9 / PE2019Q9 / PE2020Q9 / PE2021Q9 / PE2022Q9
For a sample of 400 400 400 firms, the relationship between corporate revenue ( Y i Y_i Yi) and the average years of experience per employee ( X i X_i Xi) is modeled as follows:

Y i = β 1 + β 2 X i + ϵ i ,    i = 1 , 2 , ⋯   , 400 Y_i= \beta_1 + \beta_2 X_i+\epsilon_i,\;i=1,2,\cdots, 400 Yi=β1+β2Xi+ϵi,i=1,2,,400

An analyst wish to test the joint null hypothesis that at the 95 % 95\% 95% confidence level. The p-value for the t-statistic for β 1 \beta_1 β1 is 0.07 0.07 0.07, and the p-value for the t-statistic for β 2 \beta_2 β2 is 0.06 0.06 0.06. The p-value for the F-statistic for the regression is 0.045 0.045 0.045. Which of the following statements is correct?

A. The analyst can reject the null hypothesis because each β \beta β is different from 0 0 0 at the 95 % 95\% 95% confidence level.
B. The analyst cannot reject the null hypothesis because neither β \beta β is different from 0 0 0 at the 95 % 95\% 95% confidence level.
C. The analyst can reject the null hypothesis because the F-statistic is significant at the 95 % 95\% 95% confidence level.
D. The analyst cannot reject the null hypothesis because the F-statistic is not significant at the 95 % 95\% 95% confidence level.

Answer: C
Learning Objective: Construct, apply and interpret joint hypothesis tests and confidence intervals for multiple coefficients in a regression.

The T-test would not be sufficient to test the joint hypothesis. In order to test the joint null hypothesis, examine the F-statistic, which in this case is statistically significant at the 95 % 95\% 95% confidence level. Thus the null can be rejected.


Question 21

Use the following information to answer the following question.

RegressionR squaredR sq. adj.Std. errorNum obs.
Statistics 0.8537 0.8537 0.8537 0.8120 0.8120 0.8120 10.3892 10.3892 10.3892 10 10 10
dfSSMSFP-value
Explained 2 2 2 4410.4500 4410.4500 4410.4500 2205.2250 2205.2250 2205.2250 20.4309 20.4309 20.4309 0.0012 0.0012 0.0012
Residual 7 7 7 755.5500 755.5500 755.5500 107.9357 107.9357 107.9357
Total 9 9 9 5166.0000 5166.0000 5166.0000
CoefficientStd. Errort-StatP-value
Intercept 35.5875 35.5875 35.5875 6.1737 6.1737 6.1737 5.7644 5.7644 5.7644 0.0007 0.0007 0.0007
X 1 X_1 X1 1.8563 1.8563 1.8563 1.6681 1.6681 1.6681 1.1128 1.1128 1.1128 0.3026 0.3026 0.3026
X 2 X_2 X2$7.4250 $1.1615$ 6.3923 6.3923 6.3923 0.0004 0.0004 0.0004

Based on the results and a 5 % 5\% 5% level of significance, which of the following hypotheses can be
rejected?
I. H 0 : B 0 = 0 H_0 : B_0 = 0 H0:B0=0
II. H 0 : B 1 = 0 H_0 : B_1 = 0 H0:B1=0
III. H 0 : B 2 = 0 H_0 : B_2 = 0 H0:B2=0
IV. H 0 : B 1 = B 2 = 0 H_0 : B_1 = B_2 = 0 H0:B1=B2=0

A. I, II, and III
B. I and IV
C. Ill and IV
D. I, IIl, and IV

Answer: D
The t-statistics for the intercept and coefficient on are significant as indicated by the associated p-values being less than 0.05 0.05 0.05: 0.0007 0.0007 0.0007 and 0.0004 0.0004 0.0004 respectively. Therefore, H 0 : B 0 = 0 H_0:B_0 = 0 H0:B0=0 and H 0 : B 2 = 0 H_0:B_2 = 0 H0:B2=0 can be rejected.

The F-statistic on the ANOVA table has a p-value equal to 0.0012 0.0012 0.0012; therefore, H 0 : B 1 = B 2 = 0 H_0: B_1 = B_2 = 0 H0:B1=B2=0 can be rejected.


Question 22

PE2018Q26 / PE2019Q26 / PE2020Q26 / PE2021Q26 / PE2022Q26
A risk manager has estimated a regression of a firm’s monthly portfolio returns against the returns of three U.S. domestic equity indexes: the Russell 1000 index, the Russell 2000 index, and the Russell 3000 index. The results are shown below.

RegressionMultiple RR squaredR sq. adj.Std. errorNum obs.
Statistics 0.951 0.951 0.951 0.905 0.905 0.905 0.903 0.903 0.903 0.009 0.009 0.009 192 192 192
Regression OutputCoefficientsStandard Errort-StatP-value
Intercept 0.0023 0.0023 0.0023 0.0006 0.0006 0.0006 3.530 3.530 3.530 0.0005 0.0005 0.0005
Russell 1000 0.1093 0.1093 0.1093 1.5895 1.5895 1.5895 0.068 0.068 0.068 0.9452 0.9452 0.9452
Russell 2000 0.1055 0.1055 0.1055 0.1384 0.1384 0.1384 0.762 0.762 0.762 0.4470 0.4470 0.4470
Russell 3000 0.3533 0.3533 0.3533 1.7274 1.7274 1.7274 0.204 0.204 0.204 0.8382 0.8382 0.8382
Correlation MatrixPortfolio ReturnsRussell 1000Russell 2000Russell 3000
Portfolio 1.000 1.000 1.000
Russell 1000 0.937 0.937 0.937 1.000 1.000 1.000
Russell 2000 0.856 0.856 0.856 0.813 0.813 0.813 1.000 1.000 1.000
Russell 3000 0.945 0.945 0.945 0.998 0.998 0.998 0.845 0.845 0.845 1.000 1.000 1.000

Based on the regression results, which statement is correct?

A. The estimated coefficient of 0.3533 0.3533 0.3533 indicates that the returns of the Russell 3000 index are more statistically significant in determining the portfolio returns than the other two indexes.
B. The high adjusted R 2 R^2 R2 indicates that the estimated coefficients on the Russell 1000, Russell 2000, and Russell 3000 indexes are statistically significant.
C. The high p-value of 0.9452 0.9452 0.9452 indicates that the regression coefficient of the returns of Russell 1000 is more statistically significant than the other two indexes.
D. The high correlations between each pair of index returns indicate that multicollinearity exists between the variables in this regression.

Answer: D
Learning Objective:

  • Interpret regression coefficients in a multiple regression,
  • Interpret goodness of fit measures for single and multiple regressions, including R 2 R^2 R2 and adjusted R 2 R^2 R2.
  • Characterize multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity.

This is an example of multicollinearity, which arises when one of the regressors is very highly correlated with the other regressors. In this case, all three regressors are highly correlated with each other, so multicollinearity exists between all three. Since the variables are not perfectly correlated with each other this is a case of imperfect, rather than perfect, multicollinearity.


Question 23

PE2018Q28 / PE2019Q28 / PE2020Q28 / PE2021Q28 / PE2022Q28
An analyst is testing a hypothesis that the beta, β \beta β, of stock CDM is 1 1 1. The analyst runs an ordinary least squares regression of the monthly returns of CDM, R C D M R_{CDM} RCDM, on the monthly returns of the S&P 500 index, R m R_m Rm, and obtains the following relation:
R C D M = 0.86 × R m − 0.32 R_{CDM} = 0.86\times R_m - 0.32 RCDM=0.86×Rm0.32

The analyst also observes that the standard error of the coefficient of R m R_m Rm is 0.80 0.80 0.80. In order to test the hypothesis H 0 : β = 1 H_0: \beta = 1 H0:β=1 against H 1 : β ≠ 1 H_1: \beta \neq 1 H1:β=1, what is the correct statistic to calculate?

A. t-statistic
B. Chi-square test statistic
C. Jarque-bera test statistic
D. Sum of squared residuals

Answer: A
Learning Objective:

  • Construct, apply and interpret hypothesis tests and confidence intervals for a single regression coefficient in a regression.
  • Explain the steps needed to perform a hypothesis test in a linear regression.

The t-statistic is defined by:
t = β estimated − β S E estimated  β = 0.86 − 1 0.8 = − 0.175 t=\frac{\beta^{\text{estimated}}-\beta}{SE_{\text{estimated }\beta}}=\frac{0.86-1}{0.8}=-0.175 t=SEestimated ββestimatedβ=0.80.861=0.175

In the case t = − 0.175 t=-0.175 t=0.175. Since ∣ t ∣ < 1.96 |t|<1.96 t<1.96, we cannot reject the null hypothesis.


Question 24

A bank analyst run an ordinary least squares regression of the daily returns of the stock on the daily returns on the S&P 500 index using the last 750 750 750 trading days of data. The regression results are summarized in the following tables:

PredictorCoefficientStandard Errort-statisticp-value
Constant0.05610.0029419.097100.00000
Return on the S&P 5001.20540.00298404.252250.00000
SourceDegree of FreedomSum of SquaresMean SquareF-statisticp-value
Regression111.4393911.43939163419.879710.00000
Residual Error7490.054250.00007
Total7500.44677

The bank analyst wants to test the null hypothesis that the beta of the portfolio is 1.3 1.3 1.3 at a 5 % 5\% 5% significance level. According to the regression results, the analyst would:

A. Reject the null hypothesis because the t-statistic is greater than 1.96 1.96 1.96.
B. Fail to reject the null hypothesis because the t-statistic is greater than 1.96 1.96 1.96.
C. Reject the null hypothesis because the p-value is greater than 5 % 5\% 5%.
D. Fail to reject the null hypothesis because the p-value is greater than 5 % 5\% 5%.


Question 25

Many statistical problems arise when estimating relationships using regression analysis. Some of these problems are due to the assumptions behind the regression model. Which one of the following is NOT one of these problems?

A. Stratification
B. Multicollinearity
C. Heteroscedasticity
D. Autocorrelation

Answer: A
Stratification is not related to regression analysis. Choices B, C, and D describe situations that can produce inaccurate descriptions of the relationship between the independent and dependent variables.

Multicollinearity occurs when the independent variables are themselves correlated.
Heteroscedasticity occurs when the variances are different across observations.
Autocorrelation occurs when successive observations are influenced by the proceeding observations.


Question 26

An analyst is performing a regression. The dependent variable is portfolio return while the independent variable is the years of experience of the portfolio manager. In his analysis, the resulting scatter plot is as follow; The analyst can conclude that the portfolio returns exhibit:
在这里插入图片描述

A. Heteroskedasticity
B. Homoskedasticity
C. Perfect multicollinearity
D. Non-perfect multicollinearity


Question 27

A risk manager is examining the relationship between portfolio manager’s years of working experience and the returns of their portfolios. He performs an ordinary least squares(OLS) regression of last year’s portfolio returns ( Y Y Y) on the portfolio managers’ years of working experience( X X X) and provides the following scatter plot to his supervisor:
在这里插入图片描述

Which of the following assumptions of the OLS regression has most likely been violated?

A. Perfect multicollonearity
B. Expectation of zero for the error terms
C. Normally distributed error terms
D. Homoskedasticity


Question 28

PE2022PSQ12
A risk manager at Firm SPC is testing a portfolio for heteroskedasticity using the White test. The portfolio is modeled as follows:

Y = α + β × X + ϵ Y = \alpha+\beta\times X+\epsilon Y=α+β×X+ϵ

The residuals are computed as follows:

ϵ ^ = Y i − a ^ − b × X 1 i \hat{\epsilon}=Y_i-\hat{a}-b\times X_{1i} ϵ^=Yia^b×X1i

Which of the following correctly depicts the second step in the White test for the portfolio?

A. ϵ ^ 2 = γ 0 + γ 1 X 1 i + γ 2 X 1 i 2 + η \hat{\epsilon}^2=\gamma_0+\gamma_1X_{1i}+\gamma_2X_{1i}^2+\eta ϵ^2=γ0+γ1X1i+γ2X1i2+η
B. ϵ ^ 2 = γ 1 X 1 i + γ 2 X 1 i 2 + η \hat{\epsilon}^2=\gamma_1X_{1i}+\gamma_2X_{1i}^2+\eta ϵ^2=γ1X1i+γ2X1i2+η
C. ϵ ^ 2 = γ 0 + γ 1 X 1 i + η \hat{\epsilon}^2=\gamma_0+\gamma_1X_{1i}+\eta ϵ^2=γ0+γ1X1i+η
D. ϵ ^ 2 = γ 0 + γ 1 X 1 i 2 + η \hat{\epsilon}^2=\gamma_0+\gamma_1X_{1i}^2+\eta ϵ^2=γ0+γ1X1i2+η

Answer: A
Learning Objective: Explain how to test whether a regression is affected by heteroskedasticity.

A is correct. The second step in the White test is to regress the squared residuals on a constant, on all explanatory variables, and on the cross product of all explanatory variables.

B is incorrect. It is missing the constant.

C is incorrect. It is missing the cross-product.

D is incorrect. It is missing the explanatory variable.


Question 29

The portfolio manager is interested in the systematic risk of a stock portfolio, so he estimates the linear regression: R p − R f = α + β ( R m − R f ) + ϵ R_p-R_f=\alpha+\beta(R_m-R_f)+\epsilon RpRf=α+β(RmRf)+ϵ, where R p R_p Rp is the return of the portfolio at time t t t, R m R_m Rm is the return of the market portfolio at time t t t, and R f R_f Rf is the risk-free rate, which is constant over time. Suppose that α = 0.0008 \alpha=0.0008 α=0.0008, β = 0.977 \beta=0.977 β=0.977, σ ( R p ) = 0.167 \sigma(R_p)=0.167 σ(Rp)=0.167 and σ ( R m ) = 0.156 \sigma(R_m)=0.156 σ(Rm)=0.156, What is the approximate coefficient of determination in this regression?

A. 0.913 0.913 0.913
B. 0.834 0.834 0.834
C. 0.977 0.977 0.977
D. 0.955 0.955 0.955

Answer: B
R 2 = ρ 2 = [ Cov ( R p ,    R m ) σ R p σ R m ] 2 R^2=\rho^2=\left[\cfrac{\text{Cov}(R_p,\;R_m)}{\sigma_{R_p}\sigma_{R_m}}\right]^2 R2=ρ2=[σRpσRmCov(Rp,Rm)]2, and Cov ( R p ,    R m ) = β × σ R m 2 \text{Cov}(R_p,\;R_m)=\beta\times\sigma_{R_m}^2 Cov(Rp,Rm)=β×σRm2

Thus, R 2 = [ β × σ R m σ R p ] 2 = [ 0.977 × 0.156 0.167 ] 2 = 0.83 R^2=\left[\cfrac{\beta\times\sigma_{R_m}}{\sigma_{R_p}}\right]^2=\left[\cfrac{0.977\times0.156}{0.167}\right]^2=0.83 R2=[σRpβ×σRm]2=[0.1670.977×0.156]2=0.83


Question 30

Consider the following linear regression model: Y = a + b X + ϵ Y=a+bX+\epsilon Y=a+bX+ϵ. Suppose a = 0.05 a=0.05 a=0.05, b = 1.2 b=1.2 b=1.2, S D ( Y ) = 0.26 SD(Y)=0.26 SD(Y)=0.26, S D ( e ) = 0.1 SD(e)=0.1 SD(e)=0.1. What is the correlation between X X X and Y Y Y?

A. 0.923 0.923 0.923
B. 0.852 0.852 0.852
C. 0.701 0.701 0.701
D. 0.462 0.462 0.462

Answer: A
V ( Y ) = b 2 V ( X ) + V ( ϵ ) → V ( X ) = 0.04 → S E ( X ) = 0 / 2 V(Y)=b^2V(X)+V(\epsilon)\to V(X)=0.04\to SE(X)=0/2 V(Y)=b2V(X)+V(ϵ)V(X)=0.04SE(X)=0/2

ρ = Cov ( X , Y ) σ x σ y = b × S E ( X ) S E ( Y ) = 0.923 \rho=\cfrac{\text{Cov}(X,Y)}{\sigma_x\sigma_y}=\cfrac{b\times SE(X)}{SE(Y)}=0.923 ρ=σxσyCov(X,Y)=SE(Y)b×SE(X)=0.923


Question 31

Our linear regression produces a high coefficient of determination ( R 2 R^2 R2 ) but few significant t ratios. Which assumption is most likely violated?

A. Homoscedasticity
B. Multicollinearity
C. Error term is normal with mean = 0 and constant variance = sigma2
D. No autocorrelation between error terms

Answer: B
High R 2 R^2 R2 (implies F test will probably reject null) but low significance of partial slope coefficients is the classic symptom of multicollinearity.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值