UN3412 Introduction to Econometrics Problem Set 3 Spring 2019R

Java Python Department of Economics

UN3412

Spring 2019

Problem Set 3

Introduction to Econometrics

Points are out of 60 points

1.   [20 points] Use the data in hprice1.dta. to estimate the following model (description of the variables in the data set is listed below in Table 1 :

price = β0  + β1sqrft + β2bdrms + u

where price = the (selling) price of the house (in 1000 dollars), sqrft = size of house (square feet) and bdrms = number of bedrooms in the house.

(a) Write out the estimation result in equation form. [2 point]

(b) What is the estimated increase in price for a house with one more bedroom keeping square footage constant? [2 point]

(c) What is the estimated increase in price for a house with an additional 1400-square-foot bedroom added? Compare this to your answer in (b). [4 points]

(d) What percentage of the variation in price is explained by square footage and number of bedrooms? Compare your answer to the adjusted R2. Explain the difference. [4 points]

(e) Consider the first house in the sample. Report the square footage and number of

bedrooms for this house. Find the predicted selling price for this house from the OLS regression line. [4 points]

(f)  What is the actual selling price of the first house in the sample? Find the residual of this house. Does it suggest that the buyer underpaid or overpaid for the house? Explain. [4    points]

Table 1

DATA DESCRIPTION, FILE: hprice1.dta

Variable

Definition

price

House price, in $1000.

Assess

Assessed value in $1000.

bdrms

Average number bedrooms.

Lotsize

Size of lot in square feet.

Sqft

Size of house in square feet

colonial

= 1 if house is in Colonial style. = 0 otherwise.

Lprice

Log(price)

lassess

Log(assess)

llotsize

Log(lotsize)

lsqft

Log(sqft)

2.   [20 points] Allcott and Gentzkow (2017) conducted an online survey of US adults regarding fake news after the 2016 presidential election. In their survey, they showed survey

respondents news headlines about the 2016 election and asked about whether the news

headlines were true or false. Some of the news headlines were fake and others were true.

Their dependent variable Yi  takes value 1 if survey respondent i correctly identifies whether the headline is true or false, value 0.5 if respondent is “not sure”, and value 0 otherwise.

Suppose that one conducts a similar survey and obtains the following regression result:

Y(̂)i       = 0.65 + 0.012college + 0.015ln(Daily media time) + 0.003Age, R2   = 0.14,  n = 828,

     (0.02) (0.004)               (0.003)                                  (0.001)

where college is a binary indicator that equals 1 if a survey respondent is college graduate  and 0 otherwise, ln(Daily media time) is the logarithm of daily time consuming media, and Age is age in years.

(a)   Suppose that you would like to test that people with higher education have more

accurate beliefs about news at the 1% level. State your null hypothesis precisely and report your test result. [4 points]

(b)   The estimated coefficient for ln(Daily media time) is significantly positive. Interpret this result. Explain why this is plausible. [4 points]

(c)   Even if Age is omitted, there will be little concern about the omitted variable bias problem. Do you agree? Explain briefly. [6 points]

(d)   Suppose that you now conjecture that Republicans may have different beliefs about news than Democrats. Assume that there are three groups in the data: Democrats,

Republicans and Independents. How would you change the specification of the linear regression model by adding or subtracting regressors? Explain briefly. [6 points]

3.   [20 Points] Consider the following Population Linear Regression Function (PLRF):  yi  = β0  + β1x1i  + β2x2i  + β3x3i  + β4x4i  + β5x5i  + ui                                                (1)

where, yi  = average hourly earnings/wage in $, x1= years of education, x2  = years of potential experience, x3  = years with current employer (tenure), x4  = 1 if female, x5  = 1 if nonwhite, and ui = the usual error term of the model.

For this question, use the WAGE data set that you used in PS#2. Here is the description of the variables in the dataset for your consumption. We might be using this data set for the coming problem sets too.

Obs:   526

1. wage                     average hourly earnings

2. educ                     years of education

3. exper                    years potential experience

4. tenure                   years with current employer

5. nonwhite                 =1 if nonwhite

6. female                   =1 if female

7. married                  =1 if married

8. numdep                   number of dependents

9. smsa                     =1 if live in SMSA

10. northcen                 =1 if live in north central U.S

11. south                    =1 if live in southern region

12. west                     =1 if live in western region

13. construc                 =1 if work in construc. Indus.

14. ndurman                  =1 if in nondur. Manuf. Indus.

15. trcommpu                 =1 if in trans, commun, pub ut

16. trade                    =1 if in wholesale or retail

17. services                 =1 if in services indus.

18. profserv                 =1 if in prof. serv. Indus.

19. profocc                  =1 if in profess. Occupation

20. clerocc                  =1 if in clerical occupation

21. servocc                  =1 if in service occupation

22. lwage                    log(wage)

23. expersq                  exper^2

24. tenursq                  tenure^2

(a) Consider the following restricted version of equation (1) yi  = β0  + β1x1i  + β2x2i  + ui. Suppose that x2   is omitted from the model by the researcher. For x2  to cause omitted variable bias (OVB), what conditions should it satisfy?  Show mathematically that the OLS estimator β1  is biased if x2  is omitted from the model. [4 Points]

(b) Run a regression of yi   = β0  +  β4x4  + ui   and interpret the slope coefficient β4 . (Hint: x4 is a binary explanatory variable.) [2 Points]

(c) First generate a dummy variable Di  such that Di   = 1 if male and Di  = 0 if female. Then run a regression of   yi   = β0  + β1x1  + β4x4  + β6 Di  + ui.  What  do  you  notice  in  the result? Explain why? Show mathematically that if x4   and Di   are related, this result is inevitable. [6 Points]

(d) Run, first, a simple regression of  yi   = β0  + β1x1  + ui  then yi   = β0  + β1x1  + β2x2  + ui. Explain what happened to β1 (before and after) and why it happened. [2 Point]

(e) Now run the full model (1), using both homoscedastic-only and heteroskedasticity-robust standard errors, and interpret and compare the results of both regressions. Why do we care about heteroskedasticity problem that might exist in the data? [4 Points]

(f)  Based on the regression result of the later (i.e., heteroskedasticity-robust standard errors), conduct the following hypothesis testing:

i.          H0 :  βi   = 0  vs H1 : βi   ≠ 0 where i  = 1, 2, … , 5

ii.         H0 :  β1   = β2  = β3  = β4  = β5  = 0  vs H1 : At least one βi   ≠ 0    [2 Point]

Following questions will not be graded, they are for you to practice and will be discussed at the recitation:

1.   SW Exercise 7.1

 

2.   SW Exercise 7.4

(a) The F-statistic testing the coefficients on the regional regressors are zero is 6.10. The 1% critical value (from the F 3, O distribution) is 3.78. Because 6.10 > 3.78, the regional effects are significant

at the 1% level.

(bi) The expected difference between Juanita and Molly is (X6,Juanita     X6,Molly) . ®6 = ®6. Thus a 95% confidence interval is   0.27 ± 1.96 . 0.26.

(bii) The expected difference between Juanita and Jennifer is (X5,Juanita     X5,Jennifer) . ®5 + (X6,Juanita

X6,Jennifer) . ®6 =    ®5 + ®6. A 95% confidence interval could be constructed using the general methods discussed in Section 7.3. In this case, an easy way to do this is to omit Midwest from the regression

and replace it with X5 = West. In this new regression the coefficient on South measures the

difference in wages between the South and the Midwest, and a 95% confidence interval can be computed directly.

3.   SW Empirical Exercises 7.1

 

Regressor

Model

a

b

Age

0.60  (0.04)

0.59 (0.04)

Female

 

−3.66 (0.21)

Bachelor

 

8.08 (0.21)

Intercept

1.08  (1.17)

–0.63 (1.08)

 

 

 

SER

9.99

9.07

R2

0.029

0.200

R2

0.029

0.199

(a) The estimated slope is 0.60. The estimated intercept is 1.08.

(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95%

confidence interval is 0.59 ± 1.96 × 0.04 or 0.51 to 0.66.

(c) The results are quite similar. Evidently the regression in (a) does not suffer from

important omitted variable bias.

(d) Bob’s predicted average hourly earnings = (0.59 × 26) + (− 3.66 × 0)  +  (8.08 × 0)

− 0.63 = $14.17. Alexis’s predicted average hourly earnings = (0.59 × 30) + (− 3.66 × 1)

+  (8.08 × 1)  − 0.63  = $21.49.

(e) The regression in (b) fits the data much better. Gender and education are important

predictors of earnings. The R2 and R2   are similar because the sample size is large (n = 7711).

(f)  Gender and education are important. The F-statistic is 781,which is (much) larger than the 1% critical value of 4.61.

(g) The omitted variables must have non-zero coefficients and must correlated with the

included regressor. From (f) Female and Bachelor have non-zero coefficients; yet there  does not seem to be important omitted variable bias, suggesting that the correlation of    Age and Female and Age and Bachelor is small. (The sample correlations are  Cor (Age, Female) = −0.03 and  Cor (Age,Bachelor) = 0.00).

4.   How would you construct a confidence interval for a single coefficient in multiple regression? 

5.   Describe how to obtain a confidence set for two parameters in the multiple regression model.

6.   What is a control variable in multiple regression? Give an example and explain why it can be useful in practice         

使用优化算法,以优化VMD算法的惩罚因子惩罚因子 (α) 和分解层数 (K)。 1、将量子粒子群优化(QPSO)算法与变分模态分解(VMD)算法结合 VMD算法背景: VMD算法是一种自适应信号分解算法,主要用于分解信号为不同频率带宽的模态。 VMD的关键参数包括: 惩罚因子 α:控制带宽的限制。 分解层数 K:决定分解出的模态数。 QPSO算法背景: 量子粒子群优化(QPSO)是一种基于粒子群优化(PSO)的一种改进算法,通过量子行为模型增强全局搜索能力。 QPSO通过粒子的量子行为使其在搜索空间中不受位置限制,从而提高算法的收敛速度与全局优化能力。 任务: 使用QPSO优化VMD中的惩罚因子 α 和分解层数 K,以获得信号分解的最佳效果。 计划: 定义适应度函数:适应度函数根据VMD分解的效果来定义,通常使用重构信号的误差(例如均方误差、交叉熵等)来衡量分解的质量。 初始化QPSO粒子:定义粒子的位置和速度,表示 α 和 K 两个参数。初始化时需要在一个合理的范围内为每个粒子分配初始位置。 执行VMD分解:对每一组 α 和 K 参数,运行VMD算法分解信号。 更新QPSO粒子:使用QPSO算法更新粒子的状态,根据适应度函数调整粒子的搜索方向和位置。 迭代求解:重复QPSO的粒子更新步骤,直到满足终止条件(如适应度函数达到设定阈值,或最大迭代次数)。 输出优化结果:最终,QPSO算法会返回一个优化的 α 和 K,从而使VMD分解效果最佳。 2、将极光粒子(PLO)算法与变分模态分解(VMD)算法结合 PLO的优点与适用性 强大的全局搜索能力:PLO通过模拟极光粒子的运动,能够更高效地探索复杂的多峰优化问题,避免陷入局部最优。 鲁棒性强:PLO在面对高维、多模态问题时有较好的适应性,因此适合海上风电时间序列这种非线性、多噪声的数据。 应用场景:PLO适合用于优化VMD参数(α 和 K),并将其用于风电时间序列的预测任务。 进一步优化的建议 a. 实现更细致的PLO更新策略,优化极光粒子的运动模型。 b. 将PLO优化后的VMD应用于真实的海上风电数据,结合LSTM或XGBoost等模型进行风电功率预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值