[ITS]Elon Musk impact on GameStop Share

ppt的代码加自己整理的笔记,并没有完全搞明白,后面应该会有更新,欢迎交流学习!!!

数据描述

When Elon Musk tweeted “Gamestonk!!” and a link to the WallStreetBets Reddit thread, the GameStop shares surged.

目的

用 Interrupted Time Series 探究 Elon Musk 的推特对 GMT 的股票带来了怎样的影响。

衡量参数:

  • Level Change = Start Level of Post Interval - End Level of Pre Interval
    • i.e. Post-Interval-pred[0] – Pre-Interval-pred[-1]
  • Slope Change = Slope of Post Interval - Slope of Pre Interval
    • i.e. Post-Interval-Coeficient – Pre-Interval-Coeficient

关于 level

Note: According to some definitions, Level is
the average of values in an interval. However,
based on a definition we gave here, Level is
the first and last value of the fitted model.
Our definition is more close to immediate
effect.

The Algorithm (Step by Step)

  1. Doing Linear Regression in each interval separately (Pre Interval and Post Interval)
  • Linear Regression: simple and basic form to qualify the change
  1. Checking the autocorrelation in residuals of the linear model in each interval by Durbin Watson Test. If no autocorrelation is found, we jump to step 6.
  • If there is autocorrelation in residuals of a certain model, it means some complexity in real data has not been captured by the model.–> model要换
  1. If any autocorrelation was found, then for that interval, we put away the linear model and instead try to use SARIMAX
  2. Checking the autocorrelation in residuals of the SARIMAX model in each interval by Durbin Watson Test.
  3. If any autocorrelation was found, then for that interval, we put away the SARIMAX also and conclude linearity is not the possible option for this interval
  4. In the achieved model (either linear model or SARIMAX), we calculate pred[0] (the first value of the model) and pred[-1] (the last value of the model) to attain levels at the beginning and end of the interval, and also take the coefficient of the “row number” variable as the slope of the model in the interval.
  5. Calculating:
    ➢Level Change = Post-Interval-pred[0] – Pre-Interval-pred[-1]
    ➢Slope Change = Post-Interval-Coeficient – Pre-Interval-Coeficient

Import data and visualize it

import yfinance as yf
yf.pdr_override() #需要调用这个函数
from pandas_datareader import data as web

start_date = '2020-10-01'
end_date = '2021-02-11'
data = web.DataReader('GME', data_source='yahoo', start=start_date, end=end_date)
close = data['Close']
ax = close.plot(title='GameStop Share Price')
ax.set_xlabel('Date')
ax.set_ylabel('Close(US$)')
ax.grid()
import matplotlib.pyplot as plt
plt.show()
[*********************100%***********************]  1 of 1 completed

在这里插入图片描述

分别得到 Pre Interval, Post Interval 的 level 和 slope

Slicing Data into Pre and Post Intervals

import numpy as np
data_pre = data['2020-10-01':'2021-01-26']
data_pre = data_pre.copy() # 不加会有红色warning 不知道为什么
data_pre['row_number'] = np.arange(data_pre.shape[0])
# row_number 相当于 linear regression 中的 x
data_pre
OpenHighLowCloseAdj CloseVolumerow_number
Date
2020-10-0110.09000010.2500009.6900009.7700009.77000045541000
2020-10-029.3800009.7800009.3000009.3900009.39000043405001
2020-10-059.4400009.5900009.2500009.4600009.46000028050002
2020-10-069.5600009.8400009.1000009.1300009.13000045354003
2020-10-079.2300009.5600009.1700009.3600009.36000033086004
........................
2021-01-2037.36999941.18999936.06000139.11999939.1199993347180075
2021-01-2139.23000044.75000037.00000043.02999943.0299995621690076
2021-01-2242.59000076.76000242.32000065.01000265.01000219715790077
2021-01-2596.730003159.17999361.13000176.79000176.79000117787400078
2021-01-2688.559998150.00000080.199997147.979996147.97999617858800079

80 rows × 7 columns

row_number 相当于 linear regression 中的 x

data_pre = data_pre.copy() 不加会有红色warning 不知道为什么

红色warning:

<ipython-input-14-a1b62b3d68ff>:4: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead


See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

data_post = data['2021-01-27':'2021-02-11']
data_post = data_post.copy()
data_post['row_number'] = np.arange(data_post.shape[0])
data_post
OpenHighLowCloseAdj CloseVolumerow_number
Date
2021-01-27354.829987380.000000249.000000347.510010347.510010933967000
2021-01-28265.000000483.000000112.250000193.600006193.600006588158001
2021-01-29379.709991413.980011250.000000325.000000325.000000505661002
2021-02-01316.559998322.000000212.000000225.000000225.000000373822003
2021-02-02140.759995158.00000074.22000190.00000090.000000781831004
2021-02-03112.010002113.40000285.25000092.41000492.410004426985005
2021-02-0491.19000291.50000053.33000253.50000053.500000624273006
2021-02-0554.04000195.00000051.09000063.77000063.770000813450007
2021-02-0872.41000472.66000458.02000060.00000060.000000256873008
2021-02-0956.61000157.00000046.52000050.31000150.310001268431009
2021-02-1050.77000062.83000246.54999951.20000151.2000013645500010

Pre Interval

Linear Regression
from patsy import dmatrices
expr = 'Close' + ' ~ ' + 'row_number' # formula_like 不懂
y_train, x_train = dmatrices(expr,data_pre,return_type='dataframe')
y_train
Close
Date
2020-10-019.770000
2020-10-029.390000
2020-10-059.460000
2020-10-069.130000
2020-10-079.360000
......
2021-01-2039.119999
2021-01-2143.029999
2021-01-2265.010002
2021-01-2576.790001
2021-01-26147.979996

80 rows × 1 columns

expr(expression) = 'Close' + ' ~ ' + 'row_number' --> formula_like

dmatrices():

  • Construct two design matrices given a formula_like and data.
  • By convention, the first matrix is the “outcome” or “y” data, and the second is the “predictor” or “x” data.

    因此y_train对应'Close', x_train对应'row_number'
x_train
Interceptrow_number
Date
2020-10-011.00.0
2020-10-021.01.0
2020-10-051.02.0
2020-10-061.03.0
2020-10-071.04.0
.........
2021-01-201.075.0
2021-01-211.076.0
2021-01-221.077.0
2021-01-251.078.0
2021-01-261.079.0

80 rows × 2 columns

建模后的 intercept = 这里的intercept × correlation of intercept

Linear Model
Any Auto-Correlation in the Residuals: Durbin Watson Test
from statsmodels.regression import linear_model
olsr_results = linear_model.OLS(y_train, x_train).fit()
fs = olsr_results.summary()
fs
OLS Regression Results
Dep. Variable:Close R-squared: 0.287
Model:OLS Adj. R-squared: 0.278
Method:Least Squares F-statistic: 31.45
Date:Sun, 30 Jan 2022 Prob (F-statistic):2.98e-07
Time:14:38:15 Log-Likelihood: -332.31
No. Observations: 80 AIC: 668.6
Df Residuals: 78 BIC: 673.4
Df Model: 1
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
Intercept 2.3585 3.457 0.682 0.497 -4.523 9.240
row_number 0.4237 0.076 5.608 0.000 0.273 0.574
Omnibus:122.674 Durbin-Watson: 0.312
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3971.926
Skew: 5.078 Prob(JB): 0.00
Kurtosis:35.992 Cond. No. 90.7


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
AutocorrelationValue_For_OLS_Errors = float(fs.tables[2].data[0][3])
# Durbin Watson 的值
preds = olsr_results.predict(x_train)
print(AutocorrelationValue_For_OLS_Errors)
0.312

OLS: ordinary least squares

review: p-value >0.05–> not significant

Durbin Watson Test 的值在(1.5,2.5)之间,no autocorrelation in the Residuals, the model is acceptable

if 1.5 < AutocorrelationValue_For_OLS_Errors <2.5: 
    """no autocorrelation in the Residuals"""
    Level_Start = preds[0]
    Level_End = preds[-1]
    Slope = olsr_results.params.row_number
    
    result_tab1_html = olsr_results.summary().tables[1].as_html
    result_tab1_pandas = pd.read_html(result_tab1_html,header=0,index_col=0)[0]
    if (result_tab1_pandas["P>|t|"].iloc[0] < 0.05) and (result_tab1_pandas["P>|t|"].iloc[1] < 0.05):
        Significance_Status = 3
        """
        Which means both intercept and coefficient are significant
        So Significance_Status = 3 is Ideal
        """
    elif result_tab1_pandas["P>|t|"].iloc[1] < 0.05:
        """coefficient is significant"""
        Significance_Status = 2
    elif result_tab1_pandas["P>|t|"].iloc[0] < 0.05:
        """intercept is significant"""
        Significance_Status = 1
    else:
        Significance_Status = 0
else:
    """has autocorrelation, not acceptable"""
    print("We have to go for SARIMAX model")
We have to go for SARIMAX model

理解

elif result_tab1_pandas[“P>|t|”].iloc[1] < 0.05:

“”“coefficient is significant”""

coefficient = row_number × coefficient of row_number
而 row_number is significant(这个 if 中 row_number 的p-value < 0.05)

elif result_tab1_pandas[“P>|t|”].iloc[0] < 0.05:
“”“intercept is significant”""

intercept = intercept × intercept of row_number
而 intercept is significant(这个 if 中 intercept 的p-value < 0.05)

问题:coefficient 比 intercept 更重要?为啥?

Go to SARIMAX
Find best Parameters
from pmdarima.arima import auto_arima
stepwise_fit = auto_arima(y_train, m=12, seasonal=True, d=None, D=1, trace=True, error_action='ignore', suppress_warnings=True, stepwise=True)
Performing stepwise search to minimize aic
 ARIMA(2,2,2)(1,1,1)[12]             : AIC=456.685, Time=0.26 sec
 ARIMA(0,2,0)(0,1,0)[12]             : AIC=471.333, Time=0.01 sec
 ARIMA(1,2,0)(1,1,0)[12]             : AIC=460.265, Time=0.11 sec
 ARIMA(0,2,1)(0,1,1)[12]             : AIC=469.429, Time=0.05 sec
 ARIMA(2,2,2)(0,1,1)[12]             : AIC=454.714, Time=0.21 sec
 ARIMA(2,2,2)(0,1,0)[12]             : AIC=453.718, Time=0.09 sec
 ARIMA(2,2,2)(1,1,0)[12]             : AIC=455.137, Time=0.15 sec
 ARIMA(1,2,2)(0,1,0)[12]             : AIC=455.900, Time=0.06 sec
 ARIMA(2,2,1)(0,1,0)[12]             : AIC=456.842, Time=0.04 sec
 ARIMA(3,2,2)(0,1,0)[12]             : AIC=455.130, Time=0.10 sec
 ARIMA(2,2,3)(0,1,0)[12]             : AIC=inf, Time=0.20 sec
 ARIMA(1,2,1)(0,1,0)[12]             : AIC=459.986, Time=0.03 sec
 ARIMA(1,2,3)(0,1,0)[12]             : AIC=455.256, Time=0.14 sec
 ARIMA(3,2,1)(0,1,0)[12]             : AIC=453.133, Time=0.05 sec
 ARIMA(3,2,1)(1,1,0)[12]             : AIC=452.891, Time=0.15 sec
 ARIMA(3,2,1)(2,1,0)[12]             : AIC=453.425, Time=0.44 sec
 ARIMA(3,2,1)(1,1,1)[12]             : AIC=453.520, Time=0.30 sec
 ARIMA(3,2,1)(0,1,1)[12]             : AIC=451.675, Time=0.19 sec
 ARIMA(3,2,1)(0,1,2)[12]             : AIC=453.401, Time=0.39 sec
 ARIMA(3,2,1)(1,1,2)[12]             : AIC=455.210, Time=0.54 sec
 ARIMA(2,2,1)(0,1,1)[12]             : AIC=457.550, Time=0.11 sec
 ARIMA(3,2,0)(0,1,1)[12]             : AIC=451.074, Time=0.11 sec
 ARIMA(3,2,0)(0,1,0)[12]             : AIC=451.292, Time=0.04 sec
 ARIMA(3,2,0)(1,1,1)[12]             : AIC=453.066, Time=0.15 sec
 ARIMA(3,2,0)(0,1,2)[12]             : AIC=453.061, Time=0.20 sec
 ARIMA(3,2,0)(1,1,0)[12]             : AIC=451.767, Time=0.09 sec
 ARIMA(3,2,0)(1,1,2)[12]             : AIC=inf, Time=0.81 sec
 ARIMA(2,2,0)(0,1,1)[12]             : AIC=460.167, Time=0.10 sec
 ARIMA(4,2,0)(0,1,1)[12]             : AIC=451.863, Time=0.19 sec
 ARIMA(4,2,1)(0,1,1)[12]             : AIC=453.671, Time=0.27 sec
 ARIMA(3,2,0)(0,1,1)[12] intercept   : AIC=452.064, Time=0.13 sec

Best model:  ARIMA(3,2,0)(0,1,1)[12]          
Total fit time: 5.740 seconds
SARIMAX
Any Auto-Correlation in the Residuals: Durbin Watson Test
from statsmodels.tsa.statespace.sarimax import SARIMAX
sarimax_model = SARIMAX(endog=y_train, exog=x_train, order=(3,2,0),seasonal_order=(0,1,1,12),enforce_stationarity=False)
results = sarimax_model.fit()
from statsmodels.stats.stattools import durbin_watson
AutocorrelationValue_For_ARIMAX_Errors = durbin_watson(results.resid)
AutocorrelationValue_For_ARIMAX_Errors
1.5611495127853794

1.5 < Autocorrelation Value=1.56 < 2.5


Model has been successful enough not to leave
any meaningful data in the residuals.


exog=x_train exogenous(外生的) variable = x_train(主要是row_number)

这是比找 best parameters 时的 model 多出的参数

  • 之前: SARIMA
  • 现在: SARIMAX, X–> exogenous variable

为什么?

  • Time series has it individual parameter -> lags
  • Linear regression 带来 exogenous variable --> row_number(即 x)

–> SARIMAX sees how Time series is correlated with Linear regression

results.summary()
SARIMAX Results
Dep. Variable:Close No. Observations: 80
Model:SARIMAX(3, 2, 0)x(0, 1, [1], 12) Log Likelihood -178.900
Date:Sun, 30 Jan 2022 AIC 371.799
Time:14:38:22 BIC 385.591
Sample:0 HQIC 377.103
- 80
Covariance Type:opg
coefstd errzP>|z|[0.0250.975]
Intercept 6.13e-08 5.77e+05 1.06e-13 1.000-1.13e+06 1.13e+06
row_number 5.887e-07 7.23e+05 8.14e-13 1.000-1.42e+06 1.42e+06
ar.L1 -0.3810 0.337 -1.129 0.259 -1.042 0.280
ar.L2 1.0854 0.188 5.761 0.000 0.716 1.455
ar.L3 1.1965 0.262 4.559 0.000 0.682 1.711
ma.S.L12 -0.5392 0.565 -0.954 0.340 -1.647 0.568
sigma2 48.5960 6.597 7.367 0.000 35.666 61.526
Ljung-Box (L1) (Q):0.18 Jarque-Bera (JB): 199.64
Prob(Q):0.67 Prob(JB): 0.00
Heteroskedasticity (H):47.87 Skew: 0.91
Prob(H) (two-sided):0.00 Kurtosis: 12.33


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.26e+16. Standard errors may be unstable.

Intercept 和 row_number 的 p-value > 0.05

Non Significant Intercept and Coefficient for linear part

–> Slope = 0, Intercept = 0

Getting the results of pre interval
preds = results.predict(start=min(y_train.index), end=max(y_train.index))
Level_Start = preds[0]
Level_End = preds[-1]
print('Level_Start:',Level_Start)
print('Level_End:',Level_End)
Level_Start: 6.129752604487791e-08
Level_End: 116.31853590382661

Post Interval

Linear Regression
expr = 'Close' + ' ~ ' + 'row_number' # formula_like 不懂
y_train, x_train = dmatrices(expr,data_post,return_type='dataframe')
y_train
Close
Date
2021-01-27347.510010
2021-01-28193.600006
2021-01-29325.000000
2021-02-01225.000000
2021-02-0290.000000
2021-02-0392.410004
2021-02-0453.500000
2021-02-0563.770000
2021-02-0860.000000
2021-02-0950.310001
2021-02-1051.200001
Linear Model
Any Auto-Correlation in the Residuals: Durbin Watson Test
olsr_results = linear_model.OLS(y_train, x_train).fit()
fs = olsr_results.summary()
fs
OLS Regression Results
Dep. Variable:Close R-squared: 0.733
Model:OLS Adj. R-squared: 0.703
Method:Least Squares F-statistic: 24.66
Date:Sun, 30 Jan 2022 Prob (F-statistic):0.000774
Time:14:38:22 Log-Likelihood: -59.834
No. Observations: 11 AIC: 123.7
Df Residuals: 9 BIC: 124.5
Df Model: 1
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
Intercept 286.9668 34.752 8.257 0.000 208.352 365.582
row_number -29.1697 5.874 -4.966 0.001 -42.458 -15.881
Omnibus: 1.306 Durbin-Watson: 1.819
Prob(Omnibus): 0.520 Jarque-Bera (JB): 0.734
Skew: 0.117 Prob(JB): 0.693
Kurtosis: 1.756 Cond. No. 11.3


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
  • row_number 的 p-value < 0.05–> row_number is significant–> slope = coefficient of row_number
  • 1.5 < Durbin-Watson < 2.5 --> acceptable
preds = olsr_results.predict(x_train)
Level_Start = preds[0]
Level_End = preds[-1]
print('Level_Start:',Level_Start)
print('Level_End:',Level_End)
Level_Start: 286.9668230576948
Level_End: -4.730455398559627

结果

Level Change = Start Level Post Interval - End Level Pre Interval

Level Change = 286.96 - 116.31 = 170.65

GMT 的 close price, 总体观之, 增长 170.65

Slope Change = Slope Post Interval - Slope Pre Interval

Slope Change = -29.16 - 0 = -29.16

负的,因为 rise the price immediately了然后跌
即 short term success, long term failure

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值