R的安装和配置文件的修改
使用python包CausalDiscoveryToolbox时,对R包是有依赖的,首先需要安装R,R完成安装后,修改CausalDiscoveryToolbox的配置文件,对于anaconda来说,配置文件为Lib\site-packages\cdt\utils目录下的Settings.py文件,
def __init__(self):
"""Define here the default values of the parameters."""
super(ConfigSettings, self).__init__()
self.NJOBS = 8
self.GPU = 0
self.autoset_config = True
self.verbose = False
self.default_device = 'cpu'
self.rpath = 'D:\\Program Files\\R\\R-4.3.1\\bin\\x64\\Rscript.exe' #改这里,改成Rscript.exe的路径。
R的依赖包安装
R安装完成后,还需要安装依赖包,直接看报错的话,有一定迷惑性,以为只安装RCIT这个包就可以了。实际上从源码看,需要安装RCIT、pcalg、kpcalg这三个包:
if not (RPackages.pcalg and RPackages.kpcalg and RPackages.RCIT):
raise ImportError("R Package (k)pcalg/RCIT is not available. "
"RCIT has to be installed from "
"https://github.com/Diviyan-Kalainathan/RCIT")
RCIT的安装
RCIT的安装比较简单,直接按https://github.com/Diviyan-Kalainathan/RCIT上按步骤操作就可以了。
library(devtools)
install_github("Diviyan-Kalainathan/RCIT")
安完测一下:
library(RCIT)
RCIT(rnorm(1000),rnorm(1000),rnorm(1000))
RCoT(rnorm(1000),rnorm(1000),rnorm(1000))
如果没有devtools这个库的话,首先要安装好这个库:
install.packages("devtools")
pcalg、kpcalg的安装
这两个包安装前,首先需要安装他们的依赖包,直接安装会报错:
> install.packages("pcalg")
Warning: 没有'‘graph’, ‘RBGL’'这种相依关系
而’‘graph’, ‘RBGL’'这两个包不能直接安装,R3.5或更高版本要用BiocManager安装:
install.packages("BiocManager")
BiocManager::install("graph")
BiocManager::install("RBGL")
安装完这两个包后,就可以安装pcalg、kpcalg了:
install.packages("pcalg")
install.packages("kpcalg")
到这里,CausalDiscoveryToolbox的R依赖就都安装完了,代码也能正常跑了。对于R的新手,操作起来还是比较麻烦的。
import cdt
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
from cdt.causality.graph import PC
data = pd.read_csv('<http://www.causality.inf.ethz.ch/data/lucas0_train.csv>')
# Infer the causal diagram
pc_output = PC().create_graph_from_data(data)
# Visualize the diagram
nx.draw_networkx(pc_output)
plt.show()
一个报错的处理
代码逻辑大概如下:
obj = PC()
for cause in causes:
obj.create_graph_from_data(tmp[['a', 'b', 'c', 'label']])
大概执行到第9个循环时,报错如下:
R Python Error Output
-----------------------
[Errno 2] No such file or directory: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\cdt_pc_38b36fd3-c895-40bb-a5fa-24d784fbf88e\\result.csv'
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [32], in <cell line: 1>()
----> 1 obj.create_graph_from_data(tmp[['is_offgrid', 'is_dl_weaksinr', 'is_overlapping', 'label']])
File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:278, in PC.create_graph_from_data(self, data, **kwargs)
275 self.arguments['{NJOBS}'] = str(self.njobs)
276 self.arguments['{VERBOSE}'] = str(self.verbose).upper()
--> 278 results = self._run_pc(data, verbose=self.verbose)
280 return nx.relabel_nodes(nx.DiGraph(results),
281 {idx: i for idx, i in enumerate(data.columns)})
File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:315, in PC._run_pc(self, data, fixedEdges, fixedGaps, verbose)
313 except Exception as e:
314 rmtree(run_dir)
--> 315 raise e
316 except KeyboardInterrupt:
317 rmtree(run_dir)
File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:310, in PC._run_pc(self, data, fixedEdges, fixedGaps, verbose)
307 else:
308 self.arguments['{E_EDGES}'] = 'FALSE'
--> 310 pc_result = launch_R_script(Path("{}/R_templates/pc.R".format(os.path.dirname(os.path.realpath(__file__)))),
311 self.arguments, output_function=retrieve_result, verbose=verbose)
312 # Cleanup
313 except Exception as e:
File D:\anaconda\lib\site-packages\cdt\utils\R.py:221, in launch_R_script(template, arguments, output_function, verbose, debug)
219 print("\nR Python Error Output \n-----------------------\n")
220 print(e)
--> 221 raise RuntimeError("RProcessError \nR Process Error Output \n-----------------------\n" + str(err, "ISO-8859-1")) from None
222 print("\nR Python Error Output \n-----------------------\n")
223 print(e)
RuntimeError: RProcessError
R Process Error Output
-----------------------
Loading required package: momentchi2
Loading required package: MASS
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'C:\Users\ADMINI~1\AppData\Local\Temp\cdt_pc_38b36fd3-c895-40bb-a5fa-24d784fbf88e\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\data.csv': No such file or directory
Execution halted
报的是R的错误,找不开一个文件,网上搜了一下,基本没有解决方案,github上作者也没有答复,github链接,CSDN上有一个说是要重装的(这个安装步骤写的不错):因果发现工具 Causal Discovery Toolbox(cdt)安装指南
经过分析,发现只有在for循环里执行obj.create_graph_from_data时才报错,并且也是多个循环之后才报错,猜测可能是反复执行obj.create_graph_from_data有关,根据这个分析,修改了一下代码,每次都实例化一下,问题就解决了,哈哈~~
for cause in causes:
obj = PC()
obj.create_graph_from_data(tmp[['a', 'b', 'c', 'label']])