hbase java存储图片_如何使用HBase存储图片

最新推荐文章于 2021-02-24 07:18:25 发布

百丈游丝

最新推荐文章于 2021-02-24 07:18:25 发布

阅读量1k

点赞数

文章标签： hbase java存储图片

本文链接：https://blog.csdn.net/weixin_42296571/article/details/114082665

版权

本文介绍如何使用Java将图片文件转化为Sequence File并存储到HBase中。通过读取图片，将其内容转为Bytes存储在HBase的一个column中，Rowkey为图片文件名。提供了Java代码示例和执行脚本，以及Hue中查询验证图片存储的方法。

摘要由CSDN通过智能技术生成

温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

Fayson的github：https://github.com/fayson/cdhproject

提示：代码块部分可以左右滑动查看噢

1.文档编写目的

Fayson在前面的文章中介绍了《如何使用HBase存储文本文件》和《如何使用Lily HBase Indexer对HBase中的数据在Solr中建立索引》，实现了文本文件保存到HBase中，并基于Solr实现了文本文件的全文检索。如果我们碰到的是图片文件呢，该如何保存或存储呢。本文主要描述如何将图片文件转成sequence file，然后保存到HBase。

内容概述

1.文件处理流程

2.准备上传文件的Java代码

3.运行代码

4.Hue中查询验证

测试环境

1.RedHat7.4

2.CM5.14.3

3.CDH5.14.2

4.集群未启用Kerberos

2.图片处理流程

132371070_1_2018050703475550

1.如上图所示，Fayson先在本地准备了一堆图片文件，并上传到HDFS。

132371070_2_20180507034755144

上传到HDFS

132371070_3_20180507034755253

2.然后通过Java程序遍历所有图片生成一个Sequence File，然后把Sequence File入库到HBase，在入库过程中，我们读取图片文件的文件名作为Rowkey，另外将整个图片内容转为bytes存储在HBase表的一个column里。

3.最后可以通过Hue来进行查看图片，当然你也可以考虑对接到你自己的查询系统。

3.准备上传文件的Java代码

1.首先是准备Maven文件

project xmlns='http://maven.apache.org/POM/4.0.0' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'

xsi:schemaLocation='http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd'>

modelVersion>4.0.0modelVersion>

groupId>com.clouderagroupId>

artifactId>hbase-exmapleartifactId>

version>1.0-SNAPSHOTversion>

packaging>jarpackaging>

name>hbase-exmaplename>

url>http://maven.apache.orgurl>

properties>

project.build.sourceEncoding>UTF-8project.build.sourceEncoding>

properties>

repositories>

repository>

id>clouderaid>

url>https://repository.cloudera.com/artifactory/cloudera-repos/url>

name>Cloudera Repositoriesname>

snapshots>

enabled>falseenabled>

snapshots>

repository>

repositories>

dependencies>

dependency>

groupId>org.apache.hadoopgroupId>

artifactId>hadoop-clientartifactId>

version>2.6.0-cdh5.14.2version>

dependency>

groupId>org.apache.hbasegroupId>

artifactId>hbase-clientartifactId>

version>1.2.0-cdh5.14.2version>

dependency>

groupId>junitgroupId>

artifactId>junitartifactId>

version>3.8.1version>

scope>testscope>

dependency>

dependencies>

project>

(可左右滑动)

132371070_4_20180507034755347

2.准备上传文件到HBase的Java代码

package com.cloudera;

import java.net.URI;

import org.apache.commons.io.IOUtils;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataInputStream;

import org.apache.hadoop.fs.FileStatus;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.BytesWritable;

import org.apache.hadoop.io.SequenceFile;

import org.apache.hadoop.io.Text;

public class SequenceFileTest{

//HDFS路径

static String inpath = '/fayson/picHbase';

static String outpath = '/fayson/out';

static SequenceFile.Writer writer = null;

static HTable htable = null;

public static void main(String[] args) throws Exception{

//inpath = args[0];

//outpath = args[1];

//String zklist = args[2];

//HBase入库

Configuration hbaseConf = HBaseConfiguration.create();

hbaseConf.set('hbase.zookeeper.property.clientPort', '2181');

hbaseConf.setStrings('hbase.zookeeper.quorum', 'ip-172-31-5-38.ap-southeast-1.compute.internal');

//指定表名

htable = new HTable(hbaseConf,'picHbase');

//设置读取本地磁盘文件

Configuration conf = new Configuration();

//conf.addResource(new Path('C:\\Users\\17534\\eclipse-workspace\\hbaseexmaple\\core-site.xml'));

//conf.addResource(new Path('C:\\Users\\17534\\eclipse-workspace\\hbaseexmaple\\hdfs-site.xml'));

URI uri = new URI(inpath);

FileSystem fileSystem = FileSystem.get(uri, conf,'hdfs');

//实例化writer对象

writer = SequenceFile.createWriter(fileSystem, conf, new Path(outpath), Text.class, BytesWritable.class);

//递归遍历文件夹，并将文件下的文件写入sequenceFile文件

listFileAndWriteToSequenceFile(fileSystem,inpath);

//关闭流

org.apache.hadoop.io.IOUtils.closeStream(writer);

//读取所有文件

URI seqURI = new URI(outpath);

FileSystem fileSystemSeq = FileSystem.get(seqURI, conf);

SequenceFile.Reader reader = new SequenceFile.Reader(fileSystemSeq, new Path(outpath), conf);

Text key = new Text();

BytesWritable val = new BytesWritable();

// key = (Text) ReflectionUtils.newInstance(reader.getKeyClass(), conf);

// val = (BytesWritable) ReflectionUtils.newInstance(reader.getValueClass(), conf);

int i = 0;

while(reader.next(key, val)){

String temp = key.toString();

temp = temp.substring(temp.lastIndexOf('/') + 1);

// temp = temp.substring(temp.indexOf('Image')+6, temp.indexOf('.'));

// String[] tmp = temp.split('/');

//rowKey 设计

String rowKey = temp;

// String rowKey = Integer.valueOf(tmp[0])-1+'_'+Integer.valueOf(tmp[1])/2+'_'+Integer.valueOf(tmp[2])/2;

System.out.println(rowKey);

//指定ROWKEY的值

Put put = new Put(Bytes.toBytes(rowKey));

//指定列簇名称、列修饰符、列值 temp.getBytes()

put.addColumn('picinfo'.getBytes(), 'content'.getBytes() , val.getBytes());

htable.put(put);

}

htable.close();

org.apache.hadoop.io.IOUtils.closeStream(reader);

}

/****

* 递归文件;并将文件写成SequenceFile文件

* @param fileSystem

* @param path

* @throws Exception

public static void listFileAndWriteToSequenceFile(FileSystem fileSystem,String path) throws Exception{

final FileStatus[] listStatuses = fileSystem.listStatus(new Path(path));

for (FileStatus fileStatus : listStatuses) {

if(fileStatus.isFile()){

Text fileText = new Text(fileStatus.getPath().toString());

System.out.println(fileText.toString());

//返回一个SequenceFile.Writer实例需要数据流和path对象将数据写入了path对象

FSDataInputStream in = fileSystem.open(new Path(fileText.toString()));

byte[] buffer = IOUtils.toByteArray(in);

in.read(buffer);

BytesWritable value = new BytesWritable(buffer);

//写成SequenceFile文件

writer.append(fileText, value);

}

if(fileStatus.isDirectory()){

listFileAndWriteToSequenceFile(fileSystem,fileStatus.getPath().toString());

}

(可左右滑动)

132371070_5_20180507034755503

132371070_6_20180507034755660

132371070_7_20180507034755816

4.运行代码

1.首先我们在HBase中建一张表用来保存文本文件

create 'picHbase', {NAME=>'picinfo'}

(可左右滑动)

132371070_8_20180507034755972

2.注意修改代码中的配置项，如文本文件所在的HDFS目录，集群的Zookeeper地址等。将代码打成jar包并上传到集群服务器节点。该过程略。

3.准备执行脚本

#!/bin/sh

for file in `ls /opt/cloudera/parcels/CDH/jars/*jar`;

HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$file

done

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH

echo $HADOOP_CLASSPATH

hadoop jar picHbase.jar com.cloudera.SequenceFileTest

(可左右滑动)

132371070_9_201805070347563

4.执行脚本

132371070_10_2018050703475666

脚本执行完毕，成功入库

132371070_11_20180507034756191

5.使用HBase shell检查，入库12条，全部入库成功。

132371070_12_20180507034756300

5.Hue中查询验证

1.从Hue中进入HBase的模块

132371070_13_20180507034756472

单击某个column，可以查看整个图片

132371070_14_20180507034756582

2.查询某一个Rowkey进行测试

132371070_15_20180507034756691

132371070_16_20180507034756785

本文所使用的代码源码GitHub地址：

https://github.com/fayson/cdhproject/blob/master/hbasedemo/src/main/java/com/cloudera/hbase/SequenceFileTest.java

提示：代码块部分可以左右滑动查看噢

为天地立心，为生民立命，为往圣继绝学，为万世开太平。

温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。

百丈游丝

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hbase java存储图片_如何使用HBase存储图片

温馨提示：要看高清无码套图，请使用手机打开并单击图片放大查看。Fayson的github：https://github.com/fayson/cdhproject提示：代码块部分可以左右滑动查看噢1.文档编写目的Fayson在前面的文章中介绍了《如何使用HBase存储文本文件》和《如何使用Lily HBase Indexer对HBase中的数据在Solr中建立索引》，实现了文本文件保存到HBase...
复制链接

扫一扫