ubuntu中Hadoop和Spark平台的安装

软硬件环境

 
 

名称   版本
系统Ubuntu 18.04.4 LTS
内存 7.5GiB
处理器Intel Core i7-8565U CPU @ 1.80GHz *8
图形Intel UHD Graphics(Whiskey Lake 3*8 GT2)
GNOME3.28.2
操作系统类型64位
磁盘251.0 GB
Hadoop2.10.0
Spark2.3.4

步骤

①安装ssh

 1 acat@acat-xx:~$ sudo apt-get install openssh-server
 2 [sudo] acat 的密码:
 3 正在读取软件包列表... 完成
 4 正在分析软件包的依赖关系树
 5 正在读取状态信息... 完成
 6 openssh-server 已经是最新版 (1:7.6p1-4ubuntu0.3)。
 7 下列软件包是自动安装的并且现在不需要了:
 8   fonts-wine gir1.2-geocodeglib-1.0 libfwup1 libglade2.0-cil libglib2.0-cil
 9   libgtk2.0-cil libmono-cairo4.0-cil libstartup-notification0:i386 libwine
10   libwine:i386 libxcb-util1:i386 ubuntu-web-launchers wine32:i386 wine64
11 使用'sudo apt autoremove'来卸载它(它们)。
12 升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 83 个软件包未被升级。

②配置ssh为无密码登录

 1 acat@acat-xx:~$ cd ~/.ssh/
 2 acat@acat-xx:.ssh$ ls
 3 authorized_keys  id_rsa  id_rsa.pub  known_hosts
 4 acat@acat-xx:.ssh$ ssh-keygen -t rsa
 5 Generating public/private rsa key pair.
 6 Enter file in which to save the key (/home/acat/.ssh/id_rsa):
 7 /home/acat/.ssh/id_rsa already exists.
 8 Overwrite (y/n)?
 9 acat@acat-xx:.ssh$ ls
10 authorized_keys  id_rsa  id_rsa.pub  known_hosts
11 acat@acat-xx:.ssh$ cat ./id_rsa.pub >> ./authorized_keys

③配置Java环境

下载java for linux软件包,并解压到目录:/home/acat/softwares/jdk1.8.0_161。然后编辑家目录下的.bashrc文件,添加如下内容:

export JAVA_HOME=/home/acat/softwares/jdk1.8.0_161
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PAT

然后:wq保存退出。
查看Java配置是否成功。

1 acat@acat-xx:~$ java -version
2 java version "1.8.0_161"
3 Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
4 Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

④安装Hadoop2
下载hadoop-2.10.0-src.tar.gz,并解压缩到/usr/local目录下,并把解压缩之后的文件夹hadoop-2.10.0-src重命名为hadoop。
在.bashrc文件中配置Hadoop相关的环境变量
 

export PATH=/usr/local/hadoop/sbin:$PATH
export PATH=/usr/local/hadoop/bin:$PAT
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

查看Hadoop的版本

1 acat@acat-xx:~$ hadoop version
2 Hadoop 2.10.0
3 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
4 Compiled by jhung on 2019-10-22T19:10Z
5 Compiled with protoc 2.5.0
6 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
7 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

⑤Hadoop伪分布式配置
Hadoop 可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点既作为 NameNode 也作为 DataNode,同时,读取的是 HDFS 中的文件。
Hadoop 的配置文件位于 /usr/local/hadoop/etc/hadoop/ 中,伪分布式需要修改2个配置文件 core-site.xml 和 hdfs-site.xml 。Hadoop的配置文件是 xml 格式,每个配置以声明 property 的 name 和 value 的方式来实现。
首先,修改core-site.xml文件为
 

1 <configuration>
 2      <property>
 3         <name>hadoop.tmp.dir</name>
 4         <value>file:/usr/local/hadoop/tmp</value>
 5         <description>Abase for other temporary directories.</description>
 6     </property>
 7     <property>
 8         <name>fs.defaultFS</name>
 9         <value>hdfs://localhost:9000</value>
10     </property>
11 </configuration>

然后,修改hdfs-site.xml为

1 <configuration>
 2     <property>
 3         <name>dfs.replication</name>
 4         <value>1</value>
 5     </property>
 6     <property>
 7         <name>dfs.namenode.name.dir</name>
 8         <value>file:/usr/local/hadoop/tmp/dfs/name</value>
 9     </property>
10     <property>
11         <name>dfs.datanode.data.dir</name>
12         <value>file:/usr/local/hadoop/tmp/dfs/data</value>
13     </property>
14 </configuration>

⑥下面进行NameNode的格式化

1 acat@acat-xx:hadoop$ stop-dfs.sh
 2 acat@acat-xx:hadoop$ rm -r ./tmp
 3 acat@acat-xx:hadoop$ hdfs namenode -format
 4 ...省略若干行...
 5 20/05/27 23:46:49 INFO util.GSet: capacity      = 2^15 = 32768 entries
 6 20/05/27 23:46:49 INFO namenode.FSImage: Allocated new BlockPoolId: BP-335173629-127.0.1.1-1590594409666
 7 20/05/27 23:46:49 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
 8 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
 9 20/05/27 23:46:49 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
10 20/05/27 23:46:49 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
11 20/05/27 23:46:49 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
12 20/05/27 23:46:49 INFO namenode.NameNode: SHUTDOWN_MSG:
13 /************************************************************
14 SHUTDOWN_MSG: Shutting down NameNode at acat-xx/127.0.1.1
15 ************************************************************/

⑦接着开启 NameNode 和 DataNode 守护进程。

1 acat@acat-xx:hadoop$ start-dfs.sh
 2 20/05/27 23:47:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 3 Starting namenodes on [localhost]
 4 localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-acat-namenode-acat-xx.out
 5 localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-acat-datanode-acat-xx.out
 6 Starting secondary namenodes [0.0.0.0]
 7 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-acat-secondarynamenode-acat-xx.out
 8 20/05/27 23:47:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 9 acat@acat-xx:hadoop$ jps
10 8729 Jps
11 8588 SecondaryNameNode
12 8332 DataNode
13 8126 NameNode

可以看出,在开启NameNode和DataNode守护进程之后,下面就多了三个Jave进程,分别是SecondaryNameNode、DataNode和NameNode。SecondaryNameNode可以看做是NameNode的备节点,为了防止NameNode出现故障的情况可以及时切换到SecondaryNameNode,从而可以继续提供服务。
在成功启动之后,可以访问网址:http://localhost:50070/

⑧运行Hadoop伪分布式实例
首先创建hdfs格式的文件夹和文件。
 

1 acat@acat-xx:hadoop$ hdfs dfs -mkdir -p /usr/local/hadoop/
 2 20/05/28 00:17:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 3 acat@acat-xx:hadoop$ hdfs dfs -mkdir input
 4 20/05/28 00:17:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 5 acat@acat-xx:hadoop$ hdfs dfs -ls input
 6 20/05/28 00:17:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 7 acat@acat-xx:hadoop$ hdfs dfs -put ./etc/hadoop/*.xml ./input/
 8 20/05/28 00:18:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 9 acat@acat-xx:hadoop$ hdfs dfs -ls input
10 20/05/28 00:18:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11 Found 8 items
12 -rw-r--r--   1 acat supergroup       8814 2020-05-28 00:18 input/capacity-scheduler.xml
13 -rw-r--r--   1 acat supergroup       1076 2020-05-28 00:18 input/core-site.xml
14 -rw-r--r--   1 acat supergroup      10206 2020-05-28 00:18 input/hadoop-policy.xml
15 -rw-r--r--   1 acat supergroup       1133 2020-05-28 00:18 input/hdfs-site.xml
16 -rw-r--r--   1 acat supergroup        620 2020-05-28 00:18 input/httpfs-site.xml
17 -rw-r--r--   1 acat supergroup       3518 2020-05-28 00:18 input/kms-acls.xml
18 -rw-r--r--   1 acat supergroup       5939 2020-05-28 00:18 input/kms-site.xml
19 -rw-r--r--   1 acat supergroup        690 2020-05-28 00:18 input/yarn-site.xml

运行脚本

1 acat@acat-xx:hadoop$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
 2 ...此处省略若干行...
 3     Shuffle Errors
 4         BAD_ID=0
 5         CONNECTION=0
 6         IO_ERROR=0
 7         WRONG_LENGTH=0
 8         WRONG_MAP=0
 9         WRONG_REDUCE=0
10     File Input Format Counters
11         Bytes Read=219
12     File Output Format Counters
13         Bytes Written=77

查看运行结果

1 acat@acat-xx:hadoop$ hdfs dfs -cat output/*
2 20/05/28 00:19:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
3 1   dfsadmin
4 1   dfs.replication
5 1   dfs.namenode.name.dir
6 1   dfs.datanode.data.dir

将运行结果保留在本地文件中

1 acat@acat-xx:hadoop$ ls
 2 abc  bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  README.txt  sbin  share  test.txt  tmp
 3 acat@acat-xx:hadoop$ hdfs dfs -get output ./output
 4 20/05/28 00:20:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 5 acat@acat-xx:hadoop$ cat ./output/*
 6 1   dfsadmin
 7 1   dfs.replication
 8 1   dfs.namenode.name.dir
 9 1   dfs.datanode.data.dir
10 acat@acat-xx:hadoop$ ls
11 abc  bin  etc  include  lib  libexec  LICENSE.txt  logs  NOTICE.txt  output  README.txt  sbin  share  test.txt  tmp

⑨安装Spark
首先下载文件:spark-2.3.4-bin-without-hadoop.tgz。然后将其解压缩到/usr/local目录下,将文件夹名重命名为spark。
然后在/usr/local/spark/conf目录下创建脚本文件:spark-env.sh。然后向该文件中添加如下内容:
 

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

配置完成后就可以直接使用,不需要像Hadoop运行启动命令。
通过运行Spark自带的示例,验证Spark是否安装成功。
 

1 acat@acat-xx:spark$ ./bin/run-example SparkPi | grep "Pi is"
2 Pi is roughly 3.1446357231786157

可以看出,Spark已经配置成功。

实验结果。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值