文章目录
- 1.JDK安装
- 2.Scala安装
- 3.Maven安装
- 4.Hadoop安装
- 1.下载
- 2.解压
- 3.配置ssh
- 4.修改配置文件
- 1.修改hadoop-env.sh文件中,修改其中的配置JAVA环境变量,初始值默认是${JAVA_HOME},我们需要把它改成具体的jdk所在的目录。
- 2.修改core-site.xml文件,如果没有/home/hadoop/app/tmp文件夹,需要先创建,第一个property配置的是HDFS的NameNode的地址(主机名:端口号),第二个property配置的内容用来指定Hadoop运行时产生的文件的存放目录
- 3.修改hdfs-site.xml文件,该文件是Hadoop的底层存储配置文件,可以配置namenode存储hdfs名字的空间的元数据文件目录,datanode上的一个数据块的物理的存储位置文件目录,用来指定HDFS保存数据副本的数量(现在是伪分布式,所以数量是1,将来的集群副本数量默认是3)等。
- 4.修改slaves文件,,里面写上从节点所在的主机名字
- 5.格式化namenode
- 5.配置系统环境变量
- 6.启动
- 7.检查是否安装成功
- 8.搭建yarn
- 9 .检查hadoop是否安装成功
- 10. 检查yarn是否安装成功
- 5.Zookeeper安装
- 6.HBase安装
- 7.Spark安装
- 2.解压
- 8.IDEA+Maven+Spark Streaming
1.JDK安装
略
2.Scala安装
1.下载
官网->Download->Or are you looking for previous releases of Scala?->scala-2.11.8.tgz->存放在/home/hadoop/software目录中
2.解压
tar -zxvf scala-2.11.8.tgz -C ~/app/
3.配置系统环境变量
vi ~/.bash_profile
添加
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
使其生效
source ~/.bash_profile
4.检查是否安装成功
[hadoop@hadoop000 scala-2.11.8]$ scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144).
Type in expressions for evaluation. Or try :help.
scala> 1+1
res0: Int = 2
3.Maven安装
1.下载
maven官网->Download->Previous Releases->archives->3.3.9->binaries->
apache-maven-3.3.9-bin.tar.gz或复制apache-maven-3.3.9-bin.zip 的路径wget下载到/home/hadoop/software目录下
2.解压
tar -zxvf apache-maven-3.3.9-bin.tar.gz -C ~/app/
3.配置系统环境变量
vi ~/.bash_profile
添加
export MAVEN_HOME=/home/hadoop/app/apache-maven-3.3.9
export PATH=$MAVEN_HOME/bin:$PATH
使其生效
source ~/.bash_profile
4.检查是否安装成功
[hadoop@hadoop000 scala-2.11.8]$ mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /home/hadoop/app/apache-maven-3.3.9
Java version: 1.8.0_144, vendor: Oracle Corporation
Java home: /home/hadoop/app/jdk1.8.0_144/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.32-358.el6.x86_64", arch: "amd64", family: "unix"
5.修改其conf目录下的settings.xml配置文件
在/home/hadoop下mkdir新建maven_repos文件夹
修改settings.xml中的localRepository
<localRepository>/home/hadoop/maven_repos/</localRepository>
4.Hadoop安装
1.下载
http://archive.apache.org/dist/
hadoop-2.6.0-cdh5.7.0.tar.gz到/home/hadoop/software目录下
2.解压
tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app/
3.配置ssh
/home/hadoop目录下,
ssh-keygen -t rsa
ll -a可以查看到.ssh文件夹,里面有id_rsa 和id_rsa.pub两个文件
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
4.修改配置文件
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop文件夹下,
1.修改hadoop-env.sh文件中,修改其中的配置JAVA环境变量,初始值默认是${JAVA_HOME},我们需要把它改成具体的jdk所在的目录。
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
2.修改core-site.xml文件,如果没有/home/hadoop/app/tmp文件夹,需要先创建,第一个property配置的是HDFS的NameNode的地址(主机名:端口号),第二个property配置的内容用来指定Hadoop运行时产生的文件的存放目录
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop000:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/tmp</value>
</property>
</configuration>
3.修改hdfs-site.xml文件,该文件是Hadoop的底层存储配置文件,可以配置namenode存储hdfs名字的空间的元数据文件目录,datanode上的一个数据块的物理的存储位置文件目录,用来指定HDFS保存数据副本的数量(现在是伪分布式,所以数量是1,将来的集群副本数量默认是3)等。
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4.修改slaves文件,,里面写上从节点所在的主机名字
hadoop000
5.格式化namenode
进入目录/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin下,执行
./hdfs namenode -format
5.配置系统环境变量
vi ~/.bash_profile
添加
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
使其生效
source ~/.bash_profile
6.启动
进入/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin文件夹下,执行
./start-dfs.sh
启动失败
[hadoop@hadoop000 sbin]$ ./start-dfs.sh
21/06/14 21:30:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop000]
The authenticity of host 'hadoop000 (192.168.121.131)' can't be established.
RSA key fingerprint is de:44:de:ee:0e:02:9c:2b:73:99:94:2c:af:4a:8a:ad.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
hadoop000: Warning: Permanently added 'hadoop000,192.168.121.131' (RSA) to the list of known hosts.
hadoop@hadoop000's password: hadoop000: Agent admitted failure to sign using the key.
hadoop000: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop000.out
hadoop@hadoop000's password: hadoop000: Agent admitted failure to sign using the key.
hadoop000: Connection closed by UNKNOWN
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is de:44:de:ee:0e:02:9c:2b:73:99:94:2c:af:4a:8a:ad.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (RSA) to the list of known hosts.
hadoop@0.0.0.0's password: 0.0.0.0: Agent admitted failure to sign using the key.
hadoop@0.0.0.0's password: 0.0.0.0: Permission denied, please try again.
hadoop@0.0.0.0's password: 0.0.0.0: Permission denied, please try again.
0.0.0.0: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
21/06/14 21:33:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ jps
11140 Jps
10874 NameNode
[hadoop@hadoop000 sbin]$ ssh hadoop000
Agent admitted failure to sign using the key.
hadoop@hadoop000's password:
Last login: Sun Apr 4 17:14:46 2021 from 192.168.107.2
由此可知,ssh仍需要输入密码登录,未配置成功
进入.ssh文件夹,
[hadoop@hadoop000 .ssh]$ ssh-add
Identity added: /home/hadoop/.ssh/id_rsa (/home/hadoop/.ssh/id_rsa)
[hadoop@hadoop000 .ssh]$ ssh hadoop000
Last login: Mon Jun 14 21:45:37 2021 from hadoop000
成功
[hadoop@hadoop000 sbin]$ ./start-dfs.sh
21/06/14 21:56:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop000]
hadoop000: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-namenode-hadoop000.out
hadoop000: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-datanode-hadoop000.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/hadoop-hadoop-secondarynamenode-hadoop000.out
21/06/14 21:56:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ jps
12065 NameNode
12197 DataNode
12373 SecondaryNameNode
12520 Jps
我还出现./start-dfs.sh之后,使用jps查看进程没有DataNode进程的情况。有可能是我多次hadoop namenode -format格式化namenode,namenode format清空了namenode下的数据,但是没有清空datanode下的数据,导致namenode和datanode的clusterID不一致。
因为当我们使用hadoop namenode -format格式化namenode时,会在namenode数据文件夹(这个文件夹为自己配置文件中dfs.name.dir的路径)中保存一个current/VERSION文件,记录clusterID,datanode中保存的current/VERSION文件中的clustreID的值是第一次格式化保存的clusterID。
所以我选择了修改/home/hadoop/app/tmp/dfs/name/current文件夹下的VERSION中的clusterID的值,使其与/home/hadoop/app/tmp/dfs/data/current文件夹下的VERSION中的clusterID的值一致。
参考链接
7.检查是否安装成功
[hadoop@hadoop000 sbin]$ jps
12065 NameNode
12197 DataNode
12373 SecondaryNameNode
12520 Jps
虚拟机浏览器访问,http://hadoop000:50070/
并且有一个存活的节点
8.搭建yarn
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/etc/hadoop目录下
1. 修改mapred-site.xml文件
cp mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2.修改 yarn-site.xml文件
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3.启动yarn
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/sbin目录下
./start-yarn.sh
此时多了NodeManager和ResourceManager两个进程
[hadoop@hadoop000 sbin]$ ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-hadoop000.out
hadoop000: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-hadoop000.out
[hadoop@hadoop000 sbin]$ jps
4292 ResourceManager
3960 SecondaryNameNode
3757 DataNode
4398 NodeManager
3647 NameNode
4447 Jps
访问http://hadoop000:8088/cluster
点一下Active Nodes才会出现Nodes列表
9 .检查hadoop是否安装成功
hadoop fs -ls / 查看的是HDFS文件系统的根目录
[hadoop@hadoop000 sbin]$ hadoop fs -ls /
21/06/20 18:26:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ hadoop fs -mkdir /data
21/06/20 18:31:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ hadoop fs -ls /
21/06/20 18:31:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-06-20 18:31 /data
[hadoop@hadoop000 sbin]$ hadoop fs -ls /data
21/06/20 18:33:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ hadoop fs -put mr-jobhistory-daemon.sh /data
21/06/20 18:34:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop000 sbin]$ hadoop fs -ls /data
21/06/20 18:34:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 hadoop supergroup 4080 2021-06-20 18:34 /data/mr-jobhistory-daemon.sh
[hadoop@hadoop000 sbin]$ hadoop fs -text /data/mr-jobhistory-daemon.sh
21/06/20 18:34:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#...
10. 检查yarn是否安装成功
进入/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce目录下,
hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
运行结果为
[hadoop@hadoop000 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
Number of Maps = 2
Samples per Map = 3
21/06/20 18:40:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
21/06/20 18:40:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
21/06/20 18:40:12 INFO input.FileInputFormat: Total input paths to process : 2
21/06/20 18:40:12 INFO mapreduce.JobSubmitter: number of splits:2
21/06/20 18:40:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624184518582_0001
21/06/20 18:40:14 INFO impl.YarnClientImpl: Submitted application application_1624184518582_0001
21/06/20 18:40:14 INFO mapreduce.Job: The url to track the job: http://hadoop000:8088/proxy/application_1624184518582_0001/
21/06/20 18:40:14 INFO mapreduce.Job: Running job: job_1624184518582_0001
21/06/20 18:40:32 INFO mapreduce.Job: Job job_1624184518582_0001 running in uber mode : false
21/06/20 18:40:32 INFO mapreduce.Job: map 0% reduce 0%
21/06/20 18:40:44 INFO mapreduce.Job: map 50% reduce 0%
21/06/20 18:40:45 INFO mapreduce.Job: map 100% reduce 0%
21/06/20 18:40:54 INFO mapreduce.Job: map 100% reduce 100%
21/06/20 18:40:56 INFO mapreduce.Job: Job job_1624184518582_0001 completed successfully
21/06/20 18:40:56 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=335406
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=532
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=21358
Total time spent by all reduces in occupied slots (ms)=7753
Total time spent by all map tasks (ms)=21358
Total time spent by all reduce tasks (ms)=7753
Total vcore-seconds taken by all map tasks=21358
Total vcore-seconds taken by all reduce tasks=7753
Total megabyte-seconds taken by all map tasks=21870592
Total megabyte-seconds taken by all reduce tasks=7939072
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=296
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=1000
CPU time spent (ms)=8100
Physical memory (bytes) snapshot=741052416
Virtual memory (bytes) snapshot=8293314560
Total committed heap usage (bytes)=740294656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 45.464 seconds
Estimated value of Pi is 4.00000000000000000000
5.Zookeeper安装
略
6.HBase安装
1.下载hbase到/home/hadoop/software目录下
hbase-1.2.0-cdh5.7.0.tar.gz
2.解压
tar -zxvf hbase-1.2.0-cdh5.7.0.tar.gz -C ~/app/
3.配置系统环境变量
vi ~/.bash_profile
添加
export HBASE_HOME=/home/hadoop/app/hbase-1.2.0-cdh5.7.0
export PATH=$HBASE_HOME/bin:$PATH
使其生效
source ~/.bash_profile
检查一下是否生效
[hadoop@hadoop000 software]$ echo $HBASE_HOME
/home/hadoop/app/hbase-1.2.0-cdh5.7.0
4.修改配置文件
进入 /home/hadoop/app/hbase-1.2.0-cdh5.7.0/conf目录下,
1.修改hbase-env.sh文件
export JAVA_HOME=/home/hadoop/app/jdk1.8.0_144
export HBASE_MANAGES_ZK=false
2.修改hbase-site.xml文件
- hbase.rootdir
这个目录是region server的共享目录,用来持久化HBase。默认情况下HBase是写到/tmp的。不改这个配置,数据会在重启的时候丢失。与hadoop中的core-site.xml中的配置一致。 - hbase.cluster.distributed
HBase的运行模式。false是单机模式,true是分布式模式。若为false,HBase和Zookeeper会运行在同一个JVM里面。 - hbase.zookeeper.quorum
zookeeper集群的URL配置,多个host中间用逗号(,)分割
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop000:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop000:2181</value>
</property>
</configuration>
3. 修改regionservers文件
配置regionservers这个文件的作用与hadoop中的slaves类似,用来告诉hbase集群哪些是工作节点。
hadoop000
5.启动
先启动zookeeper
zkServer.sh start
可以看到进程QuorumPeerMain
[hadoop@hadoop000 conf]$ jps
4292 ResourceManager
7829 QuorumPeerMain
3960 SecondaryNameNode
7852 Jps
3757 DataNode
4398 NodeManager
3647 NameNode
然后启动hbase
进入/home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin目录下,
./start-hbase.sh
可以看到出现了HMaster和HRegionServer进程
[hadoop@hadoop000 bin]$ ./start-hbase.sh
starting master, logging to /home/hadoop/app/hbase-1.2.0-cdh5.7.0/logs/hbase-hadoop-master-hadoop000.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
hadoop000: starting regionserver, logging to /home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin/../logs/hbase-hadoop-regionserver-hadoop000.out
hadoop000: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
hadoop000: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
[hadoop@hadoop000 bin]$ jps
8643 Jps
4292 ResourceManager
7829 QuorumPeerMain
8263 HRegionServer
3960 SecondaryNameNode
8105 HMaster
3757 DataNode
4398 NodeManager
3647 NameNode
6.检查是否启动成功
除了上述的两个进程,还可以通过访问http://hadoop000:60010来验证
执行脚本来测试一下
/home/hadoop/app/hbase-1.2.0-cdh5.7.0/bin下执行,
./hbase shell
[hadoop@hadoop000 bin]$ ./hbase shell
2021-06-20 21:08:39,715 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2021-06-20 21:08:41,324 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/app/hbase-1.2.0-cdh5.7.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:46:29 PDT 2016
hbase(main):001:0> version
1.2.0-cdh5.7.0, rUnknown, Wed Mar 23 11:46:29 PDT 2016
hbase(main):002:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
hbase(main):003:0> list
TABLE
0 row(s) in 0.0400 seconds
=> []
hbase(main):001:0> create 'member','info','address'
0 row(s) in 1.7180 seconds
=> Hbase::Table - member
hbase(main):002:0> list
TABLE
member
1 row(s) in 0.0280 seconds
=> ["member"]
hbase(main):003:0> describe 'member'
Table member is ENABLED
member
COLUMN FAMILIES DESCRIPTION
{NAME => 'address', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS =>
'0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0'
, BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.1090 seconds
其中遇到问题:
使用hbase shell的命令list查看的时候显示为空[],但是在创建表的时候出现ERROR: Table already exists错误
参考:
https://blog.csdn.net/huashao0602/article/details/77050929
https://www.jianshu.com/p/e1767d57f972?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation
原因是以前创建过这个表,但是hbase暴力删除了这个表后,zookeeper还保留了这个表的信息。
(1)通过./hbase zkcli命令进入zookeeper client模式
遇到错误:
2021-06-20 21:31:25,649 INFO [main-SendThread(hadoop000:2181)] zookeeper.ClientCnxn: Session establishment complete on server hadoop000/192.168.121.131:2181, sessionid = 0x17a297cb8020008, negotiated timeout = 30000
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:159)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:227)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:219)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:207)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer.main(ZooKeeperMainServer.java:108)
JLine support is disabled
解决办法:
vim bin/hbase文件,添加信息如下代码框部分:
elif [ “$COMMAND” = “zkcli” ] ; then
CLASS=“org.apache.hadoop.hbase.zookeeper.ZooKeeperMainServer”
CLASSPATH=`echo $CLASSPATH | sed 's/jruby-cloudera-1\.0\.0\.jar//g'`
之后重启hbase伪分布环境,之后再次进入zk的客户端不报错了
(2)ls /hbase/table查看存在的表信息
[zk: hadoop000:2181(CONNECTED) 0] ls /hbase/table
[hbase:meta, hbase:namespace, imooc_course_search_clickcount, member, course_clickcount, imooc_course_clickcount, course_search_clickcount]
(3)rmr /hbase/table/表名 删除zombie table
(4)重启Hbase,这步我没有
7.Spark安装
1.下载源码
spark官网-download-历史版本spark-2.2.0.tgz ,但是要是源码,选最后一个
因为如果下载的是安装包的话,可能跟生产上有各种冲突,下载源码,根据生产上的hadoop的版本来编译出符合生产上hadoop使用的spack的版本
如何编译可以查看documentation-latest release-导航栏more-building spark
编译这步我没做,直接用的编译好了的
2.编译
spark官网,Documentation->Lastest Release->More->Building Spark
在source文件夹下,解压源码
tar -xzvf spark-2.1.0.tgz
cd spark-2.1.0
注意这块的要求,不同版本的是不一样的,有maven和java等的要求。详见官网文档。
以3.2.0举例
设置maven的内存
export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"
继续cd到build目录下,可以看到内置的maven
我们不使用这个,自己安个maven
回到源码根目录spark-2.1.0下
mvn编译命令,指定 Hadoop 版本并启用 YARN,使用 Hive 和 JDBC 支持( -Phive -Phive-thriftserver ),可以通过阅读该目录下的pom.xml进行了解
如果你想使用Hadoop 2.x构建,请启用hadoop-2.7配置文件:
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package
这条命令可能会耗费一两个小时,因为第一次运行需要下载很多资源
我们可以对其中的参数从外部进行调整,如以上命令,这就是-D代表的意思
需要用到哪个profile就在外部命令中将选择内部启用的id,这就是-P代表的意思,需要hive的话,就需要将id:hive-thriftserver通过-P加入进来
echo $HADOOP_HOME
可以看到hadoop的版本,如2.6.0-cdh5.7.0
可能会遇到HDFS和yarn版本不一样的情况,这种情况是因为在生产环境,我们的HDFS的版本比较老,在使用spark的新特性的情况下没法支持,但是我们又不敢对整个hadoop集群进行升级,因为这个升级不可预测的因素太多。那么我们可以将HDFS稍微升一个版本。这就会导致HDFS和yarn版本不一致。默认情况下,yarn.version使用的是hadoop.version,但是我们可以将两个版本通过-D分别指定。
当支持其他版本的 Scala(如 2.13)时,就需要更改 Scala 版本,可以为该版本构建。使用以下方法更改主要的 Scala 版本(例如 2.13):
./dev/change-scala-version.sh 2.13
启用配置文件(例如 2.13):
# For Maven
./build/mvn -Pscala-2.13 compile
maven编译出来,并没有一个压缩包供我们部署,我们可以构建一个可运行的包
推荐使用:构建可运行的发行包的方式进行编译
这是两个编译方式,之前的mvn编译,和下面说的make-distribution编译。
若要创建类似于"Spark 下载"页面分发的 Spark 分发版,并且该分发的布局使其可运行,请在项目根目录中使用。它可以像直接Maven构建一样配置Maven配置文件设置等。例:./dev/make-distribution.sh
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
这将构建Spark发行版以及Python pip和R包。有关使用情况的详细信息,请运行./dev/make-distribution.sh --help
使用这种方式编译的时候,需要一定一个名字,建议直接使用hadoop的版本名如2.6.0-cdh5.7.0,–tgz打包成一个tgz包,剩下的拼上maven构建的命令就可以了,如 -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver
参考之前的:
./build/mvn -Phadoop-2.7 -Pyarn -Dhadoop.version=2.8.5 -Phive -Phive-thriftserver -DskipTests clean package
在使用maven单独编译的时候,export MAVEN_OPTS是一定要设置的,但是使用make-distribution进行编译的时候是不需要的,因为这个设置已经封装到了make-distribution.sh脚本里面了,-DskipTests clean package也存在了。
打包后的名字如下图:
$NAME就是--name中设置的,$Version是spark的名字,--name如果使用的是hadoop的版本号,那么打包后的文件就可以同时看到spark和hadoop的版本号了。如spark-2.1.0-bin-2.6.0-cdh5.7.0.tgz。这个包就可以直接拷贝到要部署的机器上进行部署了。
可能会遇到的问题:
解决:更换maven仓库
pom.xml中添加黄色部分,注意在1部分的下面
注意的第二个地方,内存不够或是啥的,具体看官网文档,-X查看详细信息
如果在编译过程中,你看到的异常信息不是太明显\看不太懂,可以在编译命令后 -X,就能看到更详细的编译信息。
2.解压
tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C ~/app/
3.配置系统环境变量
pwd得到spark位置
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
vi ~/.bash_profile
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
source ~/.bash_profile
4.检查安装是否成功
spark运行在本地测试就是local,运行集群在yarn就使用yarn
/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/bin目录中
./spark-shell --master local[2]
8.IDEA+Maven+Spark Streaming
8.1 pom.xml中添加对应的依赖
使用cdh,需要手动添加仓库repository,url需要自己检验一下是否有效比如到浏览器里打开
scala,kafka,spark,hadoop,hbase
spark官网https://spark.apache.org/,Documentation->Latest Release,Programming Guides->Spark Streaming
_2.12代表的是scala的版本,如果scala使用2.10的话,这个地方写2.10
kafka和spark streaming整合的时候需要添加另外的依赖