HDF5 被用来存储大型数据集,也被用来存储结构化的数据集 ,同时也支持存储不同类型的数据 ,HDF 是 Hierarchical Data Format (层次型数据格式) 的缩写 取这个名字是因为 它以群组 (group) 的形式来存储零个或多个数据集以及它们的元数据 ,每个群组都有一个表头,包括群组名称和对应的属性列表 ,同时还有一个群组符号表 (symbol table) 里面列出了组里的对象。来自 <https://www.coursera.org/learn/data-cleaning/lecture/q7OsM/reading-from-hdf5>
rhdf5Package安装方法:
source("https://bioconductor.org/install/")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install()
BiocManager::install("rhdf5")
library(rhdf5)
created=h5createFile("example.h5")
created
建立h5文件
建立group(二次打开,建立了example1.h5文件
created=h5createGroup("example1.h5","foo") 在example1中建立名为“foo”的群组
created=h5createGroup("example1.h5","baa")
created=h5createGroup("example1.h5","foo/foobaa")foo的子组foobaa
h5ls("example1.h5")
写入组
A=matrix(1:10,nr=5,nc=2)
h5write(A,"example1.h5","foo/A")
B=array(seq(0.1,2.0,by=0.1),dim=c(5,2,2))
attr(B,"scale")<-"liter"
h5write(B,"example1.h5","foo/foobaa/B")
h5ls("example1.h5")
写一个数据集
df=data.frame(1L:5L,seq(0,1,length.out=5),
c("ab","cde","fghi","a","s"),stringsAsFactors = FALSE)
h5write(df,"example1.h5","df")
h5ls("example1.h5")
读取数据
readA=h5read("example1.h5","foo/A")
readB=h5read("example1.h5","foo/foobaa/B")
readdf=h5read("example1.h5","df")
readA
h5write(c(12,13,14),"example1.h5","foo/A",index=list(1:3,1))
h5read("example1.h5","foo/A")
写入数据,将12,13,14,写入example1.h5的foo/A中,索引为第一列的前三行