大数据入门(1)——安装Hadoop

原文发表于我的个人网站:https://www.imhou.com/%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%85%a5%e9%97%a81-%e5%ae%89%e8%a3%85hadoop/

环境准备:Ubuntu16、JDK 8、Hadoop3.1.2

Ubuntu的安装这里就不讲了,JDK 的安装,之前是直接用apt命令安装的openjdk

1
2
3
4
5
6

// 搜索jdk版本
$ apt search openjdk
// 安装jdk8
$ apt install openjdk-8-jdk
// 安装好之后,查看版本号
$ java -version

因为后续要用到Java 的安装路径,配置到Hadoop的环境中,所以要找到安装在哪里。

1
2
3
4
5
6
7
8
9

// 使用which 命令,查看java的可执行程序在哪里
$ which java
/usr/bin/java
// 使用ls -l 命令查看java 程序的链接情况
$ ls -l /usr/bin/java
/usr/bin/java -> /etc/alternatives/java
// 再次使用ls -l 命令
$ ls -l /etc/alternatives/java
/etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

至此,发现java 的真实路径就在/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

第二步,下载hadoop,地址:https://hadoop.apache.org/releases.html

我选择了3.1.2 版本,binary download。使用wget 命令下载,然后解压

1
2

$ wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.2/hadoop-3.1.2.tar.gz
$ tar -xzvf hadoop-3.1.2.tar.gz

然后,就可以看到当前目录下的文件夹 hadoop.3.1.2了。进入该文件夹

  • bin 单机执行程序
  • etc 配置文件
  • sbin 分布式环境的执行程序
  • share/hadoop 所有引用的包,写代码时会用

编辑 ~/.bash_profile ,在文件末尾添加如下内容设置环境变量

1
2
3
4
5

HADOOP_HOME=/root/software/hadoop-3.1.2
export HADOOP_HOME

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH

保存文件,然后运行如下命令使环境变量生效

1

$ source ~/.bash_profile

进入Hadoop安装目录,编辑 etc/hadoop/hadoop-env.sh 文件并保存

1
2

  # set to the root of your Java installation
  export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

现在基本上hadoop的单机环境就安装好了,在hadoop-3.1.2/share/hadoop/mapreduce 目录下,有一个hadoop-mapreduce-examples-3.1.2.jar 示例程序。进入文件目录,通过如下命令执行该程序:

1

hadoop jar hadoop-mapreduce-examples-3.1.2.jar

看到以下信息,说明hadoop 安装成功了。

An example program must be given as the first argument. 
Valid program names are:   aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.   aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.   bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.   dbcount: An example job that count the pageview counts from a database.   distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.   grep: A map/reduce program that counts the matches of a regex in the input.   join: A job that effects a join over sorted, equally partitioned datasets   multifilewc: A job that counts words from several files.   pentomino: A map/reduce tile laying program to find solutions to pentomino problems.   pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.   randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.   randomwriter: A map/reduce program that writes 10GB of random data per node.   secondarysort: An example defining a secondary sort to the reduce.   sort: A map/reduce program that sorts the data written by the random writer.   sudoku: A sudoku solver.   teragen: Generate data for the terasort   terasort: Run the terasort   teravalidate: Checking results of terasort   wordcount: A map/reduce program that counts the words in the input files.   wordmean: A map/reduce program that counts the average length of the words in the input files.   wordmedian: A map/reduce program that counts the median length of the words in the input files.   wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

此程序带有很多的示例程序,其中有单词计数wordcount,我们可以试试。

1

hadoop jar hadoop-mapreduce-examples-3.1.2.jar wordcount /root/data/input/data.txt /root/data/output/test1

看到如下内容,表明成功:

2019-08-08 11:10:47,100 INFO mapreduce.Job:  map 0% reduce 0%
2019-08-08 11:10:52,173 INFO mapreduce.Job:  map 100% reduce 0%
2019-08-08 11:10:58,210 INFO mapreduce.Job:  map 100% reduce 100%
2019-08-08 11:10:58,218 INFO mapreduce.Job: Job job_1565165510892_0005 completed successfully
2019-08-08 11:10:58,337 INFO mapreduce.Job: Counters: 53

大家可以看到,在wordcount 后面,带了两个路径:/root/data/input/data.txt /root/data/output/test1 这两个路径分别是传入的文件地址,输出的文件夹。data.txt文件内容如下,可以自行创建编辑:

1
2
3

I love Chongqing
I love China
Chongqing is a province city of China

由于使本地环境,不具备HDFS分布式文件系统,所以执行本地的文件。

最后,通过命令行可以看到test1文件下生成了两个文件,然后_SUCCESS 和part-r-00000,使用cat part-r-00000 命令可以看到排好序的单词计数信息:

1
2
3
4
5
6
7
8
9

China   2
Chongqing       2
I       2
a       1
city    1
is      1
love    2
of      1
province        1

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

程序员imHou

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值