UN3412 Introduction to Econometrics Problem Set 3 Spring 2019R

最新推荐文章于 2024-09-26 14:58:53 发布

yyddccff

最新推荐文章于 2024-09-26 14:58:53 发布

阅读量447

点赞数 3

文章标签：开发语言

本文链接：https://blog.csdn.net/yyddccff/article/details/140736011

版权

Java Python Department of Economics

UN3412

Spring 2019

Problem Set 3

Introduction to Econometrics

Points are out of 60 points

1. [20 points] Use the data in hprice1.dta. to estimate the following model (description of the variables in the data set is listed below in Table 1 :

price = β0 + β1sqrft + β2bdrms + u

where price = the (selling) price of the house (in 1000 dollars), sqrft = size of house (square feet) and bdrms = number of bedrooms in the house.

(a) Write out the estimation result in equation form. [2 point]

(b) What is the estimated increase in price for a house with one more bedroom keeping square footage constant? [2 point]

(c) What is the estimated increase in price for a house with an additional 1400-square-foot bedroom added? Compare this to your answer in (b). [4 points]

(d) What percentage of the variation in price is explained by square footage and number of bedrooms? Compare your answer to the adjusted R2. Explain the difference. [4 points]

(e) Consider the first house in the sample. Report the square footage and number of

bedrooms for this house. Find the predicted selling price for this house from the OLS regression line. [4 points]

(f) What is the actual selling price of the first house in the sample? Find the residual of this house. Does it suggest that the buyer underpaid or overpaid for the house? Explain. [4 points]

Table 1

DATA DESCRIPTION, FILE: hprice1.dta

Variable	Definition
price	House price, in $1000.
Assess	Assessed value in $1000.
bdrms	Average number bedrooms.
Lotsize	Size of lot in square feet.
Sqft	Size of house in square feet
colonial	= 1 if house is in Colonial style. = 0 otherwise.
Lprice	Log(price)
lassess	Log(assess)
llotsize	Log(lotsize)
lsqft	Log(sqft)

2. [20 points] Allcott and Gentzkow (2017) conducted an online survey of US adults regarding fake news after the 2016 presidential election. In their survey, they showed survey

respondents news headlines about the 2016 election and asked about whether the news

headlines were true or false. Some of the news headlines were fake and others were true.

Their dependent variable Yi takes value 1 if survey respondent i correctly identifies whether the headline is true or false, value 0.5 if respondent is “not sure”, and value 0 otherwise.

Suppose that one conducts a similar survey and obtains the following regression result:

Y(̂)i = 0.65 + 0.012college + 0.015ln(Daily media time) + 0.003Age, R2 = 0.14, n = 828,

(0.02) (0.004) (0.003) (0.001)

where college is a binary indicator that equals 1 if a survey respondent is college graduate and 0 otherwise, ln(Daily media time) is the logarithm of daily time consuming media, and Age is age in years.

(a) Suppose that you would like to test that people with higher education have more

accurate beliefs about news at the 1% level. State your null hypothesis precisely and report your test result. [4 points]

(b) The estimated coefficient for ln(Daily media time) is significantly positive. Interpret this result. Explain why this is plausible. [4 points]

(c) Even if Age is omitted, there will be little concern about the omitted variable bias problem. Do you agree? Explain briefly. [6 points]

(d) Suppose that you now conjecture that Republicans may have different beliefs about news than Democrats. Assume that there are three groups in the data: Democrats,

Republicans and Independents. How would you change the specification of the linear regression model by adding or subtracting regressors? Explain briefly. [6 points]

3. [20 Points] Consider the following Population Linear Regression Function (PLRF): yi = β0 + β1x1i + β2x2i + β3x3i + β4x4i + β5x5i + ui (1)

where, yi = average hourly earnings/wage in $, x1= years of education, x2 = years of potential experience, x3 = years with current employer (tenure), x4 = 1 if female, x5 = 1 if nonwhite, and ui = the usual error term of the model.

For this question, use the WAGE data set that you used in PS#2. Here is the description of the variables in the dataset for your consumption. We might be using this data set for the coming problem sets too.

Obs: 526

1. wage average hourly earnings

2. educ years of education

3. exper years potential experience

4. tenure years with current employer

5. nonwhite =1 if nonwhite

6. female =1 if female

7. married =1 if married

8. numdep number of dependents

9. smsa =1 if live in SMSA

10. northcen =1 if live in north central U.S

11. south =1 if live in southern region

12. west =1 if live in western region

13. construc =1 if work in construc. Indus.

14. ndurman =1 if in nondur. Manuf. Indus.

15. trcommpu =1 if in trans, commun, pub ut

16. trade =1 if in wholesale or retail

17. services =1 if in services indus.

18. profserv =1 if in prof. serv. Indus.

19. profocc =1 if in profess. Occupation

20. clerocc =1 if in clerical occupation

21. servocc =1 if in service occupation

22. lwage log(wage)

23. expersq exper^2

24. tenursq tenure^2

(a) Consider the following restricted version of equation (1) yi = β0 + β1x1i + β2x2i + ui. Suppose that x2 is omitted from the model by the researcher. For x2 to cause omitted variable bias (OVB), what conditions should it satisfy? Show mathematically that the OLS estimator β1 is biased if x2 is omitted from the model. [4 Points]

(b) Run a regression of yi = β0 + β4x4 + ui and interpret the slope coefficient β4 . (Hint: x4 is a binary explanatory variable.) [2 Points]

(c) First generate a dummy variable Di such that Di = 1 if male and Di = 0 if female. Then run a regression of yi = β0 + β1x1 + β4x4 + β6 Di + ui. What do you notice in the result? Explain why? Show mathematically that if x4 and Di are related, this result is inevitable. [6 Points]

(d) Run, first, a simple regression of yi = β0 + β1x1 + ui then yi = β0 + β1x1 + β2x2 + ui. Explain what happened to β1 (before and after) and why it happened. [2 Point]

(e) Now run the full model (1), using both homoscedastic-only and heteroskedasticity-robust standard errors, and interpret and compare the results of both regressions. Why do we care about heteroskedasticity problem that might exist in the data? [4 Points]

(f) Based on the regression result of the later (i.e., heteroskedasticity-robust standard errors), conduct the following hypothesis testing:

i. H0 : βi = 0 vs H1 : βi ≠ 0 where i = 1, 2, … , 5

ii. H0 : β1 = β2 = β3 = β4 = β5 = 0 vs H1 : At least one βi ≠ 0 [2 Point]

Following questions will not be graded, they are for you to practice and will be discussed at the recitation:

1. SW Exercise 7.1

2. SW Exercise 7.4

(a) The F-statistic testing the coefficients on the regional regressors are zero is 6.10. The 1% critical value (from the F 3, O distribution) is 3.78. Because 6.10 > 3.78, the regional effects are significant

at the 1% level.

(bi) The expected difference between Juanita and Molly is (X6,Juanita X6,Molly) . ®6 = ®6. Thus a 95% confidence interval is 0.27 ± 1.96 . 0.26.

(bii) The expected difference between Juanita and Jennifer is (X5,Juanita X5,Jennifer) . ®5 + (X6,Juanita

X6,Jennifer) . ®6 = ®5 + ®6. A 95% confidence interval could be constructed using the general methods discussed in Section 7.3. In this case, an easy way to do this is to omit Midwest from the regression

and replace it with X5 = West. In this new regression the coefficient on South measures the

difference in wages between the South and the Midwest, and a 95% confidence interval can be computed directly.

3. SW Empirical Exercises 7.1

Regressor	Model
Regressor	a	b
Age	0.60 (0.04)	0.59 (0.04)
Female		−3.66 (0.21)
Bachelor		8.08 (0.21)
Intercept	1.08 (1.17)	–0.63 (1.08)

SER	9.99	9.07
R2	0.029	0.200
R2	0.029	0.199

(a) The estimated slope is 0.60. The estimated intercept is 1.08.

(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95%

confidence interval is 0.59 ± 1.96 × 0.04 or 0.51 to 0.66.

important omitted variable bias.

(d) Bob’s predicted average hourly earnings = (0.59 × 26) + (− 3.66 × 0) + (8.08 × 0)

− 0.63 = $14.17. Alexis’s predicted average hourly earnings = (0.59 × 30) + (− 3.66 × 1)

+ (8.08 × 1) − 0.63 = $21.49.

(e) The regression in (b) fits the data much better. Gender and education are important

predictors of earnings. The R2 and R2 are similar because the sample size is large (n = 7711).

(f) Gender and education are important. The F-statistic is 781,which is (much) larger than the 1% critical value of 4.61.

(g) The omitted variables must have non-zero coefficients and must correlated with the

included regressor. From (f) Female and Bachelor have non-zero coefficients; yet there does not seem to be important omitted variable bias, suggesting that the correlation of Age and Female and Age and Bachelor is small. (The sample correlations are Cor (Age, Female) = −0.03 and Cor (Age,Bachelor) = 0.00).