记录一下CausalDiscoveryToolbox的R包安装历程

R的安装和配置文件的修改

使用python包CausalDiscoveryToolbox时,对R包是有依赖的,首先需要安装R,R完成安装后,修改CausalDiscoveryToolbox的配置文件,对于anaconda来说,配置文件为Lib\site-packages\cdt\utils目录下的Settings.py文件,

    def __init__(self):
        """Define here the default values of the parameters."""
        super(ConfigSettings, self).__init__()
        self.NJOBS = 8
        self.GPU = 0
        self.autoset_config = True
        self.verbose = False
        self.default_device = 'cpu'
        self.rpath = 'D:\\Program Files\\R\\R-4.3.1\\bin\\x64\\Rscript.exe'  #改这里,改成Rscript.exe的路径。

R的依赖包安装

R安装完成后,还需要安装依赖包,直接看报错的话,有一定迷惑性,以为只安装RCIT这个包就可以了。实际上从源码看,需要安装RCIT、pcalg、kpcalg这三个包:

        if not (RPackages.pcalg and RPackages.kpcalg and RPackages.RCIT):
            raise ImportError("R Package (k)pcalg/RCIT is not available. "
                              "RCIT has to be installed from "
                              "https://github.com/Diviyan-Kalainathan/RCIT")

RCIT的安装

RCIT的安装比较简单,直接按https://github.com/Diviyan-Kalainathan/RCIT上按步骤操作就可以了。

library(devtools)
install_github("Diviyan-Kalainathan/RCIT")
安完测一下:
library(RCIT)
RCIT(rnorm(1000),rnorm(1000),rnorm(1000))
RCoT(rnorm(1000),rnorm(1000),rnorm(1000))

如果没有devtools这个库的话,首先要安装好这个库:

install.packages("devtools")

pcalg、kpcalg的安装

这两个包安装前,首先需要安装他们的依赖包,直接安装会报错:

> install.packages("pcalg")
Warning: 没有'‘graph’, ‘RBGL’'这种相依关系

而’‘graph’, ‘RBGL’'这两个包不能直接安装,R3.5或更高版本要用BiocManager安装:

install.packages("BiocManager")
BiocManager::install("graph")
BiocManager::install("RBGL")

安装完这两个包后,就可以安装pcalg、kpcalg了:

install.packages("pcalg")
install.packages("kpcalg")

到这里,CausalDiscoveryToolbox的R依赖就都安装完了,代码也能正常跑了。对于R的新手,操作起来还是比较麻烦的。

import cdt
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
from cdt.causality.graph import PC
data = pd.read_csv('<http://www.causality.inf.ethz.ch/data/lucas0_train.csv>')

# Infer the causal diagram
pc_output = PC().create_graph_from_data(data)
# Visualize the diagram
nx.draw_networkx(pc_output)
plt.show()

在这里插入图片描述

一个报错的处理

代码逻辑大概如下:

obj = PC()
for cause in causes:
	obj.create_graph_from_data(tmp[['a', 'b', 'c', 'label']])

大概执行到第9个循环时,报错如下:

R Python Error Output 
-----------------------

[Errno 2] No such file or directory: 'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\cdt_pc_38b36fd3-c895-40bb-a5fa-24d784fbf88e\\result.csv'
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [32], in <cell line: 1>()
----> 1 obj.create_graph_from_data(tmp[['is_offgrid', 'is_dl_weaksinr', 'is_overlapping', 'label']])

File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:278, in PC.create_graph_from_data(self, data, **kwargs)
    275 self.arguments['{NJOBS}'] = str(self.njobs)
    276 self.arguments['{VERBOSE}'] = str(self.verbose).upper()
--> 278 results = self._run_pc(data, verbose=self.verbose)
    280 return nx.relabel_nodes(nx.DiGraph(results),
    281                         {idx: i for idx, i in enumerate(data.columns)})

File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:315, in PC._run_pc(self, data, fixedEdges, fixedGaps, verbose)
    313 except Exception as e:
    314     rmtree(run_dir)
--> 315     raise e
    316 except KeyboardInterrupt:
    317     rmtree(run_dir)

File D:\anaconda\lib\site-packages\cdt\causality\graph\PC.py:310, in PC._run_pc(self, data, fixedEdges, fixedGaps, verbose)
    307     else:
    308         self.arguments['{E_EDGES}'] = 'FALSE'
--> 310     pc_result = launch_R_script(Path("{}/R_templates/pc.R".format(os.path.dirname(os.path.realpath(__file__)))),
    311                                 self.arguments, output_function=retrieve_result, verbose=verbose)
    312 # Cleanup
    313 except Exception as e:

File D:\anaconda\lib\site-packages\cdt\utils\R.py:221, in launch_R_script(template, arguments, output_function, verbose, debug)
    219     print("\nR Python Error Output \n-----------------------\n")
    220     print(e)
--> 221     raise RuntimeError("RProcessError \nR Process Error Output \n-----------------------\n" + str(err, "ISO-8859-1")) from None
    222 print("\nR Python Error Output \n-----------------------\n")
    223 print(e)

RuntimeError: RProcessError 
R Process Error Output 
-----------------------
Loading required package: momentchi2
Loading required package: MASS
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:\Users\ADMINI~1\AppData\Local\Temp\cdt_pc_38b36fd3-c895-40bb-a5fa-24d784fbf88e\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\data.csv': No such file or directory
Execution halted

报的是R的错误,找不开一个文件,网上搜了一下,基本没有解决方案,github上作者也没有答复,github链接,CSDN上有一个说是要重装的(这个安装步骤写的不错):因果发现工具 Causal Discovery Toolbox(cdt)安装指南

经过分析,发现只有在for循环里执行obj.create_graph_from_data时才报错,并且也是多个循环之后才报错,猜测可能是反复执行obj.create_graph_from_data有关,根据这个分析,修改了一下代码,每次都实例化一下,问题就解决了,哈哈~~


for cause in causes:
	obj = PC()
	obj.create_graph_from_data(tmp[['a', 'b', 'c', 'label']])
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值