NoSQL计算平台搭建
1.安装操作系统
略
2.更新软件包
尤其注意gcc 和 glibc 两个包是否安装
使用:rpm -qa | grep gcc*
rpm -qa | grep glibc*
使用:yum update 更新软件包
3.创建用户
使用:
groupadd hadoop 创建用户组
useradd -g hadoop -d /home/hadoop -p hadoop hadoop 创建用户
4.设置免密登录
使用:
rm -rf /root/.ssh/* 清空密钥文件夹
ssh-keygen -t rsa 生成公钥
ssh-copy-id -i yidu1 将公钥复制到其他节点机器(本机也要执行免密登录)
ssh-copy-id -i yidu2 将公钥复制到其他节点机器(本机也要执行免密登录)
ssh-copy-id -i yidu3 将公钥复制到其他节点机器(本机也要执行免密登录)
5.修改Hostname
使用:vim /etc/hosts
添加节点机器IP及节点机器hostname
例如:192.168.18.150 yidu1
vim /etc/sysconfig/network
修改本机Hostname
HOSTNAME=yidu1
6.创建本地软件文件夹(可忽略,但建议执行本步骤)
mkdir -p /usr/local/software
mkdir -p /usr/local/bigdata
将软件上传至次文件夹内
本次所需软件有:
apache-hive-2.0.0-bin.tar.gz
hbase-1.1.4-bin.tar.gz(选装)
Python-2.7.11.tar.xz
spark-1.6.2-bin-hadoop2.6.tgz
apache-maven-3.3.9-bin.tar.gz
jdk-8u77-linux-x64.tar.gz
scala-2.12.0-RC1.tgz
zookeeper-3.4.8.tar.gz
hadoop-2.6.4.tar.gz
pip-8.1.2.tar.gz(Python 安装工具选装)
setuptools-26.0.0.zip(Python 安装工具选装)
使用:
tar -zxvf * -C /usr/local/bigdata
(将软件包解压缩到指定文件夹内)
将软件包改名方便修改环境变量
7.修改环境变量
使用:
vim /etc/profile
在文件结尾添加以下信息:
HADOOP_HOME=/usr/local/bigdata/hadoop
ZOOKEEPER_HOME=/usr/local/bigdata/zookeeper
JAVA_HOME=/usr/local/bigdata/jdk
MAVEN_HOME=/usr/local/bigdata/maven
SCALA_HOME=/usr/local/bigdata/scala
SPARK_HOME=/usr/local/bigdata/spark
PATH=$PATH:$HOME/bin:$HADOOP_HOME/bin:$ZOOKEEPER_HOME/bin:$JAVA_HOME/bin:$MAVEN_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin
export PATH
export JAVA_LIBRARY_PATH=/usr/local/bigdata/hadoop/lib/native
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export SPARK_EXAMPLES_JAR=/usr/local/bigdata/spark/lib/spark-examples-1.6.2-hadoop2.6.0.jar
执行:
source /etc/profile
8.修改软件配置信息
修改zookeeper:
进入zookeeper目录
cd $ZOOKEEPER_HOME/conf
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
修改:dataDir=/usr/local/bigdata/zookeeper/data
结尾添加:
server.1=yidu1:2888:3888
server.2=yidu2:2888:3888
server.3=yidu3:2888:3888
mkdir $ZOOKEEPER_HOME/data
touch $ZOOKEEPER_HOME/data/myid
echo '1' > myid (节点1在myid中写入:1;节点二在myid中写入:2;节点三在myid中写入:3;)
修改hadoop:
进入:cd /usr/local/bigdata/hadoop/etc/hadoop/
主要修改:
core-site.xml
hadoop-env.sh
hdfs-site.xml
mapred-site.xml
slaves
yarn-env.sh
yarn-site.xml
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/bigdata/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://yidu1:9000</value>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
</property>
</configuration>
hadoop-env.sh
export JAVA_HOME=/usr/local/bigdata/jdk
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>yidu1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/bigdata/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/bigdata/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>yidu1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>yidu1:19888</value>
</property>
</configuration>
slaves
yidu2(节点hostname)
yidu3(节点hostname)
yarn-env.sh
JAVA_HOME=/usr/local/bigdata/jdk
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>yidu1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>yidu1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>yidu1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>yidu1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>yidu1:8088</value>
</property>
</configuration>
mkdir -p /usr/local/bigdata/hadoop/dfs/name
mkdir -p /usr/local/bigdata/hadoop/dfs/data
mkdir -p /usr/local/bigdata/hadoop/tmp
hdfs namenode -format
hadoop fs -mkdir -p /usr/hive/warehouse
hadoop fs -mkdir -p /usr/hive/tmp
修改Spark配置
cd /usr/local/bigdata/spark/conf
cp slaves.template slaves
vim slaves
添加:
yidu2
yidu3
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
结尾添加:export SCALA_HOME=/usr/local/bigdata/scala
重启机器
9.启动软件
重启后切换到hadoop用户
执行:
zkServer.sh start (三台机器都执行)
zkServer.sh status (查看zkserver运行状态)
start-all.sh
jps (查看运行状态)
使用浏览器:输入:http://Master:8088 查看是否可以打开
使用浏览器:输入:http://Master:8080 查看是否可以打开
全部可以打开证明软件安装成功
10.使用spark进行计算
执行:
hadoop fs -mkdir /data 创建hdfs文件夹
将spark文件夹下的README.md上传到hdfs上
hadoop fs -put README.md /data
计算:
执行:spark-shell
sc
val file = sc.textFile("hdfs://yidu1:9000/data/README.md")
val sparks=file.filter(line => line.contains("Spark"))
sparks.count
返回:
res1: Long = 17