python 因果_python中的因果影响r包

本文介绍了如何在Python中使用R的因果影响分析包,探讨了将R包移植到Python环境的过程,为数据科学家提供了一种在Python中进行因果推断的方法。
摘要由CSDN通过智能技术生成

python 因果

因果关系 (Causality)

CausalImpact is an R package developed by Google for causal inference using Bayesian Structural time-series models. You can find the R version here.

CausalImpact是Google开发的R包,用于使用贝叶斯结构时间序列模型进行因果推断。 您可以在这里找到R版本。

In short, what this package does is making counterfactual predictions. In other words, what would have happened in a parallel (sort of) universe if an intervention never had happened? Here is a quick example straight from Google’s website: “Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets or clicks on other sites), the package constructs a Bayesian structural time-series model. This model is then used to try and predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had never occurred.”

简而言之,此程序包所做的是做出反事实的预测。 换句话说,如果从未发生过干预,那么在平行(某种)宇宙中会发生什么? 以下是直接来自Google网站的简单示例:“考虑到响应时间序列(例如点击)和一组控制时间序列(例如在不受影响的市场中的点击或其他网站的点击),该程序包构建了贝叶斯结构时间序列模型。 然后,使用该模型来尝试预测反事实,即,如果从未发生过干预,则干预后干预指标将如何演变。”

Image for post
http://google.github.io/CausalImpact/ http://google.github.io/CausalImpact/

Image Description: Part A of the image (original) shows, with the dark continuous line, the time series of something we are monitoring. The blue dotted one is the counterfactual prediction. The vertical grey line is the moment when an intervention was made. We can observe that from that moment onwards, blue and black lines drift apart. Part B (pointwise) illustrates the difference of those lines over time which in essence is the causal effect we are interested in, while Part C (cumulative) is the cumulative difference over time.

图像描述:图像的A部分(原始)以黑色实线显示了我们正在监视的时间序列。 蓝色虚线表示反事实预测。 垂直灰线是进行干预的时刻。 我们可以看到,从那一刻起,蓝线和黑线逐渐分开。 B部分(逐点)说明了这些行随时间的差异,这本质上是我们感兴趣的因果效应,而C部分(累积)是随时间的累积差异。

I know you can work with R, but for Python lovers, I am not aware of the equivalent package. Surely, there are some libraries implementing parts of the original paper. By checking out some of those Python implementations I noticed differences in terms of the results. Long story short, here you can check how to run this package from python. Similarly, the approach is generalisable to probably any R package for that matter.

我知道您可以使用R,但是对于Python爱好者来说,我不知道等效的软件包。 当然,有些图书馆实现了原论文的某些部分。 通过检查其中一些Python实现,我注意到了结果方面的差异。 长话短说,在这里您可以检查如何从python运行此程序包。 类似地,该方法对于该问题可能适用于任何R包。

What worked for me was to create a new Conda environment with both Python libraries and core R packages pre-installed. Here is an example: conda create -n r_env numpy pandas statsmodels r-essentials r-base

对我有用的是创建一个预安装Python库和核心R包的新Conda环境。 这是一个示例: conda create -n r_env numpy pandas statsmodels r-essentials r-base

Creating the environment should take some time. Also, note that Jupyter notebook requires further configuration so I tend to edit the code in any programming editor instead and run from the command line.

创建环境需要一些时间。 另外,请注意Jupyter笔记本需要进一步的配置,因此我倾向于在任何编程编辑器中编辑代码,然后从命令行运行。

What we would also need is rpy2 which does all the work for us. It is a python interface to the R language. pip install rpy2 would do.

我们还需要rpy2 ,它可以为我们完成所有工作。 它是R语言的python接口。 pip install rpy2可以。

Load all the libraries as below:

加载所有库,如下所示:

#rpy2 lib
from rpy2.robjects.packages import importr
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects import Formula
import rpy2.robjects.packages as rpackages
import rpy2.robjects.vectors as StrVector
from rpy2.ipython.ggplot import image_png#typical python libs
import numpy as np
import pandas as pd
import datetime#arma
from statsmodels.tsa.arima_process import ArmaProcess

Create a Pandas dataframe with at least 3 columns, 1 for datetime (we will make it an index below), a predictor x1 or more (x2,x3,x4,…,xn) and a response variable y. This assumes you already have your data ready for consumption, otherwise, you can create some as below:

创建一个至少有3列的Pandas数据框,其中1个用于datetime (将在下面作为索引),一个x1或更大的预测变量( x2,x3,x4,…,xn )和一个响应变量y 。 这假定您已经准备好可以使用数据,否则,可以创建以下数据:

# Creating synthetic data - skip this if you are running it on your # own data - The data generation is taken from #https://github.com/dafiti/causalimpactar = np.array([1, 0.9])
ma = np.array([1])
arma_process = ArmaProcess(ar, ma)
X = 100 + arma_process.generate_sample(nsample=100)
y = 1.2 * X + np.random.normal(size=100)
y[70:] += 5base = datetime.datetime.today()
dt = [base - datetime.timedelta(days=x) for x in range(100)]df = pd.DataFrame({'y': y, 'x1': X,'date':dt}, columns=['y', 'x1','date'])

Making sure you have datetime as an index as discussed above.

确保您将datetime作为索引,如上所述。

#make datetime an index to the df
df.set_index('date',inplace=True)

Define the period before the intervention and the period after the intervention in terms of the index datetime in the dataframe.

根据dataframe中的索引datetime时间定义干预之前的时间和干预之后的时间。

#Set pre and post intervention periods
pre_period = [0, 69]
post_period = [70, 99]#R conversion
pre_period=robjects.FloatVector(pre_period)
post_period=robjects.FloatVector(post_period)

This should give you the interval between the start of the period until the intervention (pre_period) and from the intervention until the last day in your data (post_period).

这应该为您提供从周期开始到干预(pre_period)和从干预到数据最后一天(post_period)之间的时间间隔。

#Load R libraries from within Python - R interfaceutils=rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
packnames=('CausalImpact','bsts') # any other R library required
names_to_install = [x for x in packnames if not rpackages.isinstalled(x)]#Load package required to install R packages
from rpy2.robjects.vectors import StrVectorif len(names_to_install) > 0:
utils.install_packages(StrVector(names_to_install))

This might take some time; sip some coffee, and some more.

这可能需要一些时间。 喝点咖啡,还有更多。

But now here we are. Ready to run causal inference on our data.

但是现在我们到了。 准备对我们的数据进行因果推断。

robjects.numpy2ri.activate()
pandas2ri.activate()
rdf=robjects.conversion.py2rpy(df)
causalimpact=importr('CausalImpact')
impact=causalimpact.CausalImpact(rdf,pre_period,post_period)summary_func=robjects.r('function(x) summary(x)')
summary_func(impact)#Summary with descriptive report
summary_report_func=robjects.r('function(x) summary(x,"report")')
summary_report_func(impact)#Create causality plot
img_file='causalimpact.png'
rstr="""
library(ggplot2)
function(x,y){
p<-plot(x)
ggsave(y,plot=p)
}"""rfunc=robjects.r(rstr)
rfunc(impact,img_file)

For more options and discussion on the causalimpact package see below:

有关causalimpact软件包的更多选项和讨论,请参见以下内容:

翻译自: https://towardsdatascience.com/causal-impact-r-package-in-python-bc005f63f4c2

python 因果

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值