本文适用于大数据架构基础环境测试,至于大数据理论请自行百度学习了解,本文为环境实操,配置内容用于测通环境,实际生产应用需要添加更详细更具体的配置项,建议小伙伴们动手操作下

1,机器与系统准备

centos7系统 最小化安装 设置好网络和防火墙  网络需要能访问外网

三台机器,一台master,两台slave,设置主机名为master,slave01,slave02

添加hosts

cat /etc/hosts

192.168.1.10   master

192.168.1.11   slave01

192.168.1.12   slave02


下面关闭防火墙

setenforce 0

systemctl stop firewalld

systemctl disable firewalld

sed -i 's/enforcing/disabled/g' /etc/selinux/config


设置yum源

yum install wget -y

cd /etc/yum.repos.d/

wget http://mirrors.aliyun.com/repo/Centos-7.repo

wget http://mirrors.aliyun.com/repo/epel-7.repo

yum -y install epel-release

yum install net-tools -y

yum install tree -y



配置三台机器免密登陆

打开ssh的rsa认证

vi /etc/ssh/sshd_config

RSAAuthentication yes

PubkeyAuthentication yes

然后重启sshd

systemctl restart sshd


创建用户hadoop

groupadd hadoop

useradd -m -g hadoop hadoop

echo  "hadoop" |passwd --stdin hadoop   或者直接passwd hadoop 输入密码hadoop


切换到普通用户

su hadoop

cd /home/hadoop/

ssh-keygen -t rsa #为你生成rsa密钥,可以直接一路回车,执行默认操作

生成密钥后,会出现

.ssh

├── id_rsa

└── id_rsa.pub #公钥 服务端需要里边内容验证连接着身份

cd .ssh/

touch authorized_keys

cat id_rsa.pub >> authorized_keys

chmod 600 authorized_keys

chmod 700 id_rsa*

复制slave01,slave02的id_rsa.pub公钥添加到master的authorized_keys

将有三个机器的公钥authorized_keys文件复制到slave机器上

scp authorized_keys hadoop@slave01:/home/hadoop/.ssh/

scp authorized_keys hadoop@slave02:/home/hadoop/.ssh/

然后都重启sshd   systemctl restart sshd   

之后就可以免密访问了


上面的基本环境三台机器协同配置好,务必保证准确


##############################################,

2,下面安装JDK和hadoop


本次用了最新版本的jdk-8u151-linux-x64.tar.gz(官网下载)

hadoop用户下操作

cd /usr/

mkdir java

cd java/

tar zxf jdk-8u151-linux-x64.tar.gz

下载hadoop   

cd /home/hadoop/

mkdir bigdata

cd bigdata/

wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

tar -zxf hadoop-2.7.5.tar.gz

mv hadoop-2.7.5 hadoop


设置用户环境变量

vi /home/hadoop/.bashrc

export JAVA_HOME=/usr/java/jdk1.8.0_151

export HADOOP_HOME=/home/hadoop/bigdata/hadoop

export HADOOP_USER_NAME=hadoop

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH


source /home/hadoop/.bashrc  #加载设置的变量


下面修改Hadoop的配置文件

创建数据目录

##配置jdk和环境变量、创建目录也都要在slaves上操作,等hadoop修改完配置,用复制的方法拷贝到slaves机器上。

cd /home/hadoop/bigdata/

mkdir -p data/hadoop/tmp

mkdir -p data/hadoop/hdfs/datanode

mkdir -p data/hadoop/hdfs/namenode


vi /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
	<name>hadoop.tmp.dir</name>
	<value>/home/hadoop/bigdata/data/hadoop/tmp</value>
</property>
</configuration>

core-site.xml文件添加了每个节点的临时文件目录tmp,最好自己先行创建


vi /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

hdfs-site.xml文件添加了节点数据目录datanode和namenode,最好提前创建


vi /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml   #如果没有这个文件就用模板复制一个

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

vi /home/hadoop/bigdata/hadoop/etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

vi /home/hadoop/bigdata/hadoop/etc/hadoop/slaves

slave01
slave02

以上配置仅为基本参数,关于资源分配和性能参数可以根据业务相应的加配置!


在master上配置完成后,把hosts .bashrc /home/hadoop/bigdata/hadoop data 复制到slave01 slave02 的对应目录

scp /etc/hosts hadoop@slave01:/etc/hosts      

scp /etc/hosts hadoop@slave02:/etc/hosts

#scp -r /usr/java/jdk1.8.0_151 hadoop@slave01:/usr/java/

#scp -r /usr/java/jdk1.8.0_151 hadoop@slave02:/usr/java/

scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/

scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/

scp -r /home/hadoop/bigdata/hadoop hadoop@slave01:/home/hadoop/bigdata/

scp -r /home/hadoop/bigdata/hadoop hadoop@slave02:/home/hadoop/bigdata/

最后在slave机器上执行下 source /home/hadoop/.bashrc  #加载设置的变量


启动hadoop集群在master上执行

cd /home/hadoop/bigdata/hadoop/sbin

sh start-all.sh


[hadoop@master sbin]$ jps 
2713 ResourceManager
2362 NameNode
5053 Jps
2558 SecondaryNameNode

[hadoop@slave01 sbin]$ jps 
2769 NodeManager
3897 Jps
2565 DataNode

到此hadoop的集群启动成功。


##########################################


hive部署
wget https://mirrors.aliyun.com/apache/hive/stable/apache-hive-1.2.2-bin.tar.gz


cd /home/hadoop/bigdata/

tar zxf apache-hive-1.2.2-bin.tar.gz

mv apache-hive-1.2.2 hive


修改配置

cd /home/hadoop/bigdata/hive/conf

cp hive-default.xml.template hive-site.xml

cp hive-env.sh.template hive-env.sh

cp hive-log4j.properties.template hive-log4j.properties


vi hive-env.sh

export HADOOP_HOME=/home/hadoop/bigdata/hadoop

export HIVE_CONF_DIR=/home/hadoop/bigdata/hive/conf


vi hive-log4j.properties

hive.log.threshold=ALL

hive.root.logger=INFO,DRFA

hive.log.dir=/home/hadoop/bigdata/hive/log

hive.log.file=hive.log


vi hive-site.xml

<property>  
    <name>hive.metastore.warehouse.dir</name> 
    <value>hdfs://master:9000/user/hive/warehouse</value> 
  </property> 
<property>
    <name>hive.exec.scratchdir</name>
    <value>hdfs://master:9000/user/hive/scratchdir</value>
  </property>
  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/bigdata/hive/tmp</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/hadoop/bigdata/hive/tmp</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/home/hadoop/bigdata/hive/tmp</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  <property>
    <name>hive.querylog.location</name>
    <value>/home/hadoop/bigdata/hive/logs</value>
    <description>Location of Hive run time structured log file</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://master:3306/hivemeta?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://master:9083</value>
    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
  </property>

hive配置中是经常出错的可根据启动使用中的错误提示修改。

hadoop fs -mkdir  -p /user/hive/scratchdir

hadoop fs -mkdir -p /user/hive/warehouse 

hadoop fs -chmod g+w  /user/hive/scratchdir

hadoop fs -chmod g+w /user/hive/warehouse


启动metastore 和hiveserver2 服务


nohup hive --service metastore&

nohup hive --service hiveserver2&

[hadoop@master bin]$ hive
Logging initialized using configuration in file:/home/hadoop/bigdata/hive/conf/hive-log4j.properties
hive> show databases;
OK
default
fucktime
Time taken: 1.14 seconds, Fetched: 2 row(s)
hive>



##########################################


3,spark部署 zookeeper部署  hbase部署


先在master上操作 修改好后scp到slave上,

首先下载软件,用阿里云镜像下载

cd /home/hadoop/bigdata/

wget https://mirrors.aliyun.com/apache/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz

wget https://mirrors.aliyun.com/apache/zookeeper/stable/zookeeper-3.4.10.tar.gz

wget https://mirrors.aliyun.com/apache/hbase/stable/hbase-1.2.6-bin.tar.gz

wget https://www.scala-lang.org/files/archive/scala-2.10.4.tgz


解压到bigdata目录

cd /home/hadoop/bigdata/

tar zxf spark-2.2.1-bin-hadoop2.7.tgz

tar zxf scala-2.10.4.tgz

tar zxf zookeeper-3.4.10.tar.gz

tar zxf hbase-1.2.6-bin.tar.gz


mv spark-2.2.1-bin-hadoop2.7 spark

mv scala-2.10.4 scala

mv zookeeper-3.4.10 zk

mv hbase-1.2.6 hbase



在用hadoop的.bashrc中添加对应系统的环境变量

export HIVE_HOME=/home/hadoop/bigdata/hive

export PATH=$PATH:$HIVE_HOME/bin

export SCALA_HOME=/home/hadoop/bigdata/scala

export PATH=$PATH:$SCALA_HOME/bin

export SPARK_HOME=/home/hadoop/bigdata/spark

export PATH=$PATH:$SPARK_HOME/bin

export ZK_HOME=/home/hadoop/bigdata/zk

export PATH=$PATH:$ZK_HOME/bin

export HBASE_HOME=/home/hadoop/bigdata/hbase

export PATH=$PATH:$HBASE_HOME/bin


source /home/hadoop/.bashrc

复制到slave机器上

scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/

scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/

source /home/hadoop/.bashrc


******************************************************

修改spark配置

cd /home/hadoop/bigdata/spark

cp spark-env.sh.template spark-env.sh


vi spark-env.sh

export SCALA_HOME=/home/hadoop/bigdata/scala

export JAVA_HOME=/usr/java/jdk1.8.0_151

export HADOOP_HOME=/home/hadoop/bigdata/hadoop

export HADOOP_CONF_DIR=/home/hadoop/bigdata/hadoop/etc/hadoop

SPARK_MASTER_IP=master

SPARK_LOCAL_DIRS=/home/hadoop/bigdata/spark

SPARK_DRIVER_MEMORY=512M


cp slaves.template slaves

vi slaves

slave01

slave02


将spark拷贝到slave机器

cd /home/hadoop/bigdata/

scp -r spark hadoop@slave01:/home/hadoop/bigdata/

scp -r spark hadoop@slave02:/home/hadoop/bigdata/


cd /home/hadoop/bigdata/spark/sbin

sh start-all.sh

[hadoop@master sbin]$ jps 
2713 ResourceManager
2362 NameNode
1268  Master
5053 Jps
2558 SecondaryNameNode

[hadoop@slave01 sbin]$ jps 
2769 NodeManager
3897 Jps
25623 Worker
2565 DataNode

*********************************************

修改zookeeper配置

cd  /home/hadoop/bigdata/zk/conf/

cp zoo_sample.cfg zoo.cfg


vi zoo.cfg

dataDir=/home/hadoop/bigdata/zk/zkdata

dataLogDir=/home/hadoop/bigdata/zk/zkdatalog

server.1=master:2888:3888

server.2=slave01:2888:3888

server.3=slave02:2888:3888


echo "1" > /home/hadoop/bigdata/zkdata/myid


复制zk到slave机器上

cd /home/hadoop/bigdata/

scp -r zk hadoop@slave01:/home/hadoop/bigdata/

scp -r zk hadoop@slave02:/home/hadoop/bigdata/


在slave机器上分别修改myid

echo "2" > /home/hadoop/bigdata/zkdata/myid

echo "3" > /home/hadoop/bigdata/zkdata/myid


在各节点启动zkServer

cd /home/hadoop/bigdata/zk/bin/

./zkServer.sh start

查看状态

sh zkServer.sh status


******************************************************

修改hbase配置

cd /home/hadoop/bigdata/hbase/conf

vi habse-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave01,slave02</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/bigdata/zk/zkdata</value>
</property>
</configuration>


vi regionservers

slave01

slave02


将habse复制到slave机器上

cd  /home/hadoop/bigdata/


scp -r hbase hadoop@slave01:/home/hadoop/bigdata/


scp -r hbase hadoop@slave02:/home/hadoop/bigdata/


在master上启动hbase

cd  /home/hadoop/bigdata/hbase/bin

sh start-hbase.sh

查看状态

hbase shell

status


总结:本实验操作过程中遇到的问题很少,如果出现问题,基本都是xml配置修改的问题,一般调整好配置就能测通了,另外此文可结合https://blog.51cto.com/superleedo/1894519 一同用于搭建环境。

祝福小伙伴们实测顺利