安装包下载(jdk1.8、hadoop-3.1.4、zookeeper-3.4.6、hbase-1.2.0)
提取码: ijie
一、环境准备
环境准备
- jdk-8u221-linux-x64.rpm(jdk运行环境,jdk路径不能有空格)
- hadoop-3.1.4.tar.gz
- hbase-1.2.0-bin.tar.gz
- zookeeper-3.4.6.tar.gz(启动hbase集群所需的外部zookeeper节点)
1、将以上tar包解压
命令: tar -zvxf 包名
2、hadoop配置
# 修改全局配置
vim /etc/profile
export HADOOP_HOME=/usr/hadoop/hadoop-3.1.4
export PATH=$HADOOP_HOME/bin:$PATH
# 让配置生效
source /etc/profile
3、jdk配置
修改etc\hadpoop\hadoop-env.sh 文件配置jdk
找到set JAVA_HOME=%JAVA_HOME%替换为set JAVA_HOME=jdk绝对路径
4、core-site.xml
该文件在hadoop\etc\hadoop下
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://主机ip:9000</value>
</property>
</configuration>
5、hdfs-site.xml(在hadoop-3.1.4目录下创建dfs/name和data来存储数据)
<configuration>
<property>
<!-- 单机为1,如果有集群就设置多个 -->
<name>dfs.replication</name>
<value>1</value>
</property>
<!-- 配置http访问地址 -->
<property>
<name>dfs.http.address</name>
<value>主机ip:9870</value>
</property>
<!--指定hdfs中namenode的存储位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-3.1.4/dfs/name</value>
</property>
<!--指定hdfs中datanode的存储位置-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-3.1.4/dfs/data</value>
</property>
</configuration>
6、启动hadoop
- 进入hadoop-3.1.4\bin下执行hdfs namenode -format命令
- 进入sbin目录下./start-all.sh
- 启动成功: jps会有四个进程
- NameNode、DataNode、ResourcManager、NodeManager
7、上传文件
- hadoop fs -mkdir /user/input 创建目录
- hdfs dfs -put /文件路径 /hdfs/目标路径
- hadoop fs -ls / 查看上传文件
至此hadoop就算安装完成了
二、安装Zookeeper
1、解压Zookeeper
- bin目录
zk的可执行脚本目录,包括zk服务进程,zk客户端,等脚本。其中,.sh是Linux环境下的脚本,.cmd是Windows环境下的脚本。
- conf目录
配置文件目录。zoo_sample.cfg为样例配置文件,需要修改为自己的名称,一般为zoo.cfg。log4j.properties为日志配置文件。
- zookeeper启动脚本默认是寻找 zoo.cfg,进入conf拷贝一份zoo_sample.cfg改名为zoo.cfg
- vim zoo.cfg dataDir= #保存数据的目录 dataLogDir = #保存日志的目录
- 启动zk 进入zk\bin ./zkServer.sh start
- 客户端连接 ./zkCli.sh 创建hbase 的连接节点 create /hbase “”
三、安装HBase
1、解压tar -zxvf hbase-1.2.0-bin.tar.gz
2、配置hbase-env.sh
# 配置jdk
export JAVA_HOME=jdk根路径
# 将HBASE_MANAGES_ZK设置为false是因为hbase本身有zookeeper服务节点,若要使用多节点配置,就需要外部配置的zookeeper
export HBASE_MANAGES_ZK=false
3、hbase-site.xml
<configuration>
<!--配置hdfs的中habse目录地址 hdfs://server01:9000 为hdfs地址 hbase为 hbase服务文件存放目录-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://主机ip:9000/hbase</value>
</property>
<!--是否集群 这里只部署单机版 因此 配置为false 如果是集群则 修改为true-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--配置zk端口-->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!--zk ip地址,如果是使用呢zk集群则多个ip逗号分隔-,也可以将端口直接配置在ip后边 192.168.1.1:2181,192.168.1.2:2181-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>主机ip:2181</value>
</property>
<!--zk数据目录,zk安装是会配置该目录 保持一致即可-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/hadoop/zk/data</value>
</property>
<!-- 各节点时间不一致,导致RegionServer 退出。-- hbase.master.maxclockskew 增大容忍度,默认是30s,但不要太大,毕竟时间不一致是不正常现象,可将所有节点和同一服务器时间做同步,也可以和时间服务器同步 -->
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
<!-- 启动后可以通过http://主机ip:60010访问hbase 的web页面 -->
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>
</configuration>
4、启动Hbase
- 启动hbase 进入bin ./start-hbase.sh
- 启动成功jps会出现HMaster、HRegionServer两个进程
- 进入hbase : hbase shell
- list查看所有的表
- create ‘表名’
至此环境已经准备妥当接下来开始springboot连接上hadoop和hbase
四、Java API连接HBase
1、依赖 pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.4.3</version>
</parent>
<groupId>com.gzso.media</groupId>
<artifactId>mediaserver</artifactId>
<version>1.0.0</version>
<packaging>jar</packaging>
<name>mediaserver</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.4</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.4</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.1.4</version>
</dependency>
<!-- hbase -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.3.0</version>
</dependency>
<!-- JSON -->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.46</version>
</dependency>
<!-- jsoup 网页解析 -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
</dependencies>
<profiles>
<profile>
<id>dev</id>
<!--默认运行配置-->
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<activeProfile>dev</activeProfile>
</properties>
</profile>
<profile>
<id>test</id>
<properties>
<activeProfile>test</activeProfile>
</properties>
</profile>
<profile>
<id>pro</id>
<properties>
<activeProfile>pro</activeProfile>
</properties>
</profile>
</profiles>
</project>
2. HDFSService用于连接hadoop
@Service
public class HDFSService {
private FileSystem fs = null;
// 构造方法执行之后进行初始化
@PostConstruct
public void init() throws IOException {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", Settings.hdfsUrl);
fs = FileSystem.get(conf);
}
// 程序退出之前断开连接
@PreDestroy
public void close() throws IOException {
fs.close();
}
}
3. HBaseService用于连接hbase数据库
@Service
public class HBaseService {
private Logger log = LoggerFactory.getLogger(HBaseService.class);
/**
* 刷新字节流
*/
private final byte[] POSTFIX = new byte[] { 0x00 };
@Autowired
private HDFSService hdfsService;
/**
* 管理员可以做表以及数据的增删改查功能
*/
private Admin admin = null;
private Connection connection = null;
// 构造方法执行之后进行初始化
@PostConstruct
public void init() throws IOException {
Configuration conf = new Configuration();
//配置zookeeper集群
conf.set("hbase.zookeeper.quorum",Settings.zooKeeperUrl);
conf.set("hbase.zookeeper.property.clientPort", Settings.zooKeeperPort);
conf.set("hbase.master", Settings.master);
connection = ConnectionFactory.createConnection(conf);
admin = connection.getAdmin();
//creatTable(Settings.hdfsName, "_");
}
// 程序退出之前断开连接
@PreDestroy
public void close() throws IOException {
admin.close();
connection.close();
}
}
4、yml配置
server:
port: 3000
servlet:
context-path: /media
hdfs:
url: hdfs://192.168.73.128:9000
name: test
hbase:
zookeeper:
quorum: master
port: 2181
master: 192.168.73.128:16000
4、Setting配置值对应yml文件配置
package com.gzso.media.constant;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
@Component
public class Settings {
public static String contextPath;
@Value("${server.servlet.context-path}")
public void setContextPath(String contextPath) {
this.contextPath = contextPath;
}
public static String apiServer;
@Value("${setting.apiServer}")
public void setApiServer(String apiServer) {
this.apiServer = apiServer;
}
public static String hdfsName;
@Value("${hdfs.name}")
public void setHDFSRoot(String hdfsName) {
this.hdfsName = hdfsName;
}
public static String hdfsUrl;
@Value("${hdfs.url}")
public void setHDFSUrl(String hdfsUrl) {
this.hdfsUrl = hdfsUrl;
}
public static Boolean hasResultPrint;
@Value("${setting.hasResultPrint}")
public void setHasResultPrint(Boolean hasResultPrint) {
this.hasResultPrint = hasResultPrint;
}
/**
* zookeeper集群ip
*/
public static String zooKeeperUrl;
@Value("${hbase.zookeeper.quorum}")
public void setZooKeeperUrl(String zooKeeperUrl) {
this.zooKeeperUrl = zooKeeperUrl;
}
/**
* zookeeper集群端口
*/
public static String zooKeeperPort;
@Value("${hbase.zookeeper.port}")
public void setZooKeeperPort(String zooKeeperPort) {
this.zooKeeperPort = zooKeeperPort;
}
/**
* master相关信息
*/
public static String master;
@Value("${hbase.master}")
public void setMaster(String master) {
this.master = master;
}
}