5-1 -课程目录
实战环境搭建
JDK安装 Zookeeper安装
Scala安装 Hbase安装
Maven安装 Spark安装
Maven安装 Spark安装
Hadoop安装 IDEA+Maven+Spark Streaming
由于JDK和Zookeeper安装已经在之前安装过,所以本次课程不讲解。
5-2 -Scala安装
1、下载
wget https://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz
2、解压
tar -zxvf scala-2.11.8.tgz -C /home/hadoop/app/
3、配置系统环境变量
vi ~/.bash_profile
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
source ~/.bash_profile
4、检查是否安装成功
输入 Scala出现Scala就安装成功了
5-3 -Maven安装
1、下载
wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz
2、解压
# tar -zxvf apache-maven-3.5.4-bin.tar.gz -C /home/hadoop/app/
3、配置系统环境变量
vi ~/.bash_profile
export MAVEM_HOME=/home/hadoop/app/apache-maven-3.5.4
export PATH=$MAVEM_HOME/bin:$PATH
source ~/.bash_profile
4、检查是否安装成功
mav
5、修改默认仓库的路径
vi /home/hadoop/app/apache-maven-3.5.4/conf/settings.xml
<localRepository>/path/to/local/repo</localRepository>
5-4 -Hadoop环境搭建
一、hadoop环境搭建
1、安装之前免密登录
ssh 免密码登录(本步骤可以省略,但是后面重启hadoop进程时需要手工输入密码才行)
ssh-keygen -t rsa
执行后查看ssh : cd ~/.ssh/
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
2、下载
wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0.tar.gz
3、解压
tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /home/hadoop/app/
4、配置参数
1) hadoop -env.sh
查看JAVE_HOME : echo $JAVA_HOME
加上配置JDK环境
export JAVA_HOME=/root/java/jdk1.8.0_161
2)core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hafs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/tmp</value>
</property>
3)hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property>
具体可以参照官方:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
5、配置系统变量
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
source ~/.bash_profile
6、格式化HDFS
注意:这一步操作,只是在一次使用时格式化
1)Format the filesystem:
$ bin/ hdfs namenode -format
在bin 目录下执行 ./hdfs namenode -format
6)启动HDFS
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
7、检查是否安装成功
jps
二、YARN环境搭建
参看文档:
http://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/SingleCluster.html
1、配置
YARN on a Single Node
1、Configure parameters as follows:
etc/hadoop/mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
etc/hadoop/yarn-site.xml:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
2、启动
NodeManager daemon:
$ sbin/start-yarn.sh
3、验证
jps
ResourceManager
NodeManager
web:
Browse the web interface for the ResourceManager; by default it is available at:
- ResourceManager - http://localhost:8088/
4、停止YARN:
$ sbin/stop-yarn.sh
5-5 -HBase安装
1、下载
wget http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.15.0.tar.gz
2、解压
tar -zxvf hbase-1.2.0-cdh5.15.0.tar.gz -C /home/hadoop/app/
3、配置系统环境变量
vi ~/.bash_profile
export HBASE_HOME=/home/hadoop/app/hbase-1.2.0-cdh5.15.0
export PATH=$HBASE_HOME/bin:$PATH
source ~/.bash_profile
查看系统变量
echo $HBASE_HOME
4、配置文件 vi hbase-env.sh
1、导出JDK路径
export JAVA_HOME=/root/java/jdk1.8.0_161
2、设置export HBASE_MANAGES_ZK=false
3、配置 vim hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop000:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop000:2181</value>
</property>
4、配置 vim regionservers
5、启动Zookeeper ./zkServer.sh start
6、验证是否启动成功
1、 jps
多了二个进程
HMaster HRegionServer
2、浏览器hadoop000:60010
5-6 -Spark环境搭建
1、下载到官网(源码编译版本)
(http://spark.apache.org/downloads.html)
wget https://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0.tgz
编译步骤
http://spark.apache.org/docs/latest/building-spark.html
前置要求
1)The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. Note that support for Java 7 was removed as of Spark 2.2.0.
2) export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
mvn编译命令
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
前提:需要对maven有一定了解
./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package
spark源码编译
mvn编译 make-distribution.sh
2、解压
tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.taz -C /home/hadoop/app/
3、配置系统环境变量
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
source ~/.bash_profile
4、检查是否安装成功
./spark-shell --master local[2]
5-7 -开发环境搭建
使用IDEA整合Maven搭建Spark Streaming 开发环境
在pom.xml添加对应的依赖
<properties>
<scala.version>2.11.8</scala.version>
<kafka.version>0.9.0.0</kafka.version>
<spark.version>2.2.0</spark.version>
<hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
<hbase.version>1.2.0-cdh5.7.0</hbase.version>
</properties>
<!--添加cloudera的repository-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
</repository>
</repositories>
<!-- Hadoop 依赖-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!-- HBase 依赖-->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
<!-- Spark Streaming 依赖-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>