R语言做贝叶斯网络结构学习

文章目录


本文使用pcalg包做贝叶斯网络结构的学习

#实例分析
##pcalg包的安装
安装pcalg包时由于有些依赖包已经被CRAN撤掉,所以安装起来有些技巧。
以下是安装代码:

install.packages('BiocManager', repos = 'https://mirrors.tuna.tsinghua.edu.cn/CRAN')
options('BioC_mirror'='https://mirrors.tuna.tsinghua.edu.cn/bioconductor/')
BiocManager::install(c('graph','RBGL','Rgraphviz'))
install.packages(c('kpcalg'), repos = 'https://mirrors.tuna.tsinghua.edu.cn/CRAN')

##贝叶斯网络结构学习
###iris数据集
我们使用iris数据集进行学习,这是一个针对连续变量的结构学习的例子:

suffStat <-list(C = cor(iris[,1:4]), n = nrow(iris))
pc.fit <- pc(suffStat, indepTest = gaussCItest,
             alpha = 0.01,labels =names(iris[1:4]),verbose = TRUE)
pc.fit
plot(pc.fit)

这里写图片描述

###Adult数据集
该数据从美国1994年人口普查数据库抽取而来,可以用来预测居民收入是否超过50K/year。该数据集类变量为年收入是否超过50k$,属性变量包含年龄,工种,学历,职业,人种等重要信息,值得一提的是,14个属性变量中有7个类别型变量。

  • age – The age of the individual
  • type_employer – The type of employer the individual has. Whether they are government, military, private, an d so on.
  • fnlwgt – The # of people the census takers believe that observation - represents. We will be ignoring this variable
  • education – The highest level of education achieved for that individual
  • education_num – Highest level of education in numerical form
  • marital – Marital status of the individual
  • occupation – The occupation of the individual
  • relationship – A bit more difficult to explain. Contains family relationship values like husband, father, and so on, but only contains one per observation. I’m not sure what this is supposed to represent
  • race – descriptions of the individuals race. Black, White, Eskimo, and so on
  • sex – Biological Sex
  • capital_gain – Capital gains recorded
  • capital_loss – Capital Losses recorded
  • hr_per_week – Hours worked per week
  • country – Country of origin for person
  • income – Boolean Variable. Whether or not the person makes more than $50,000 per annum income.

对于离散数据,我们使用 G 2 G^2 G2统计量来检验条件独立性。
接来下探索一下,教育,婚姻状态,职业,收入的关系是怎样的:

url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
adult <- read.csv(url, strip.white = TRUE, header = FALSE,
    col.names=c("age", "type_employer", "fnlwgt", "education", 
    "education_num","marital", "occupation", "relationship", "race","sex",
    "capital_gain", "capital_loss", "hr_per_week","country", "income"))

> adult<-adult[,c(4,6,7,15)] 
> head(adult)
  education            marital        occupation income
1 Bachelors      Never-married      Adm-clerical  <=50K
2 Bachelors Married-civ-spouse   Exec-managerial  <=50K
3   HS-grad           Divorced Handlers-cleaners  <=50K
4      11th Married-civ-spouse Handlers-cleaners  <=50K
5 Bachelors Married-civ-spouse    Prof-specialty  <=50K
6   Masters Married-civ-spouse   Exec-managerial  <=50K
adult2<-sapply(adult,function(x){
  temp<-as.integer(factor(x))-1
})
nlev<-sapply(adult,function(x){
  temp<-factor(x)
  length(levels(temp))
})
adult2<-as.data.frame(adult2)
suffStat <- list(dm = adult2, nlev = nlev, adaptDF = FALSE)

pc.fit <- pc(suffStat, indepTest = disCItest,
             alpha = 0.01,labels =names(adult2),verbose = TRUE)
pc.fit

plot(pc.fit)

这里写图片描述

该包的详细说明:

作为分享主义者(sharism),本人所有互联网发布的图文均遵从CC版权,转载请保留作者信息并注明作者a358463121专栏:http://blog.csdn.net/a358463121,如果涉及源代码请注明GitHub地址:https://github.com/358463121/。商业使用请联系作者。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值