安装伪分布式Hadoop系统与WordCount程序实验

最新推荐文章于 2020-06-19 10:29:25 发布

学海乌鸦

最新推荐文章于 2020-06-19 10:29:25 发布

阅读量940

点赞数

分类专栏：大数据文章标签： mapreduce 大数据 Hadoop

本文链接：https://blog.csdn.net/u010647575/article/details/52912308

版权

大数据专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、系统安装运行情况

下面将分别介绍实验的系统环境及hadoop安装运行情况。

1. 实验环境

处理器：Intel Core i5-2500S CPU @2.70GHz

系统：Ubuntu 16.04 LTS 64bit

内存：2GB

磁盘：20 GB

Java环境：

java version "1.8.0_111"

Java(TM) SE Runtime Environment (build 1.8.0_111-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

jdk路径：/usr/java/jdk1.8.0_111

2. Hadoop安装运行情况

下面按顺序介绍hadoop的安装流程。

（1）创建hadoop-user用户组和hadoop用户

创建hadoop-user用户组和hadoop用户：

wup@ubuntu:~$ adduser hadoop

wup@ubuntu:~$ adduser hadoop hadoop-user

注释:使用adduser创建用户,不需要自己创建用户目录等,而用useradd则需要

（2）配置环境变量

首先切换到hadoop用户：

wup@ubuntu:~$ su hadoop

创建hadoop安装目录，下载hadoop并移入其中，解压并查看：

hadoop@ubuntu:~$ mkdir ~/hadoop_installs; cd ~/hadoop_installs

hadoop@ubuntu:~/hadoop_installs$ tar –zxvf hadoop-2.7.3.tar.gz

配置环境变量，使用gedit命令打开~/.bash_profile文件，并添加如下内容：

PATH=$PATH:$HOME/bin

export JAVA_HOME=/usr/java/jre1.8.0_111

export HADOOP_HOME=/home/hadoop/hadoop_installs/hadoop-2.7.3

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

export CLASSPATH=$JAVA_HOME/lib:.

（3）免密钥SSH访问配置

使用如下命令产生认证文件：

hadoop@ubuntu:~$ ssh-keygen -t rsa

该命令将会在~/.ssh/目录下生成id_rsa.pub认证文件，对它进行重命名即可实现免密钥SSH访问：

hadoop@ubuntu:~$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys

hadoop@ubuntu:~/hadoop_installs$ ssh localhost

Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-42-generic x86_64)

* Documentation: https://help.ubuntu.com

* Management: https://landscape.canonical.com

* Support: https://ubuntu.com/advantage

129 packages can be updated.

0 updates are security updates.

*** System restart required ***

Last login: Sat Oct 22 06:09:01 2016 from 127.0.0.1

（4）配置Hadoop

Hadoop的配置文件在/home/hadoop/ hadoop_installs/hadoop-2.7.3/etc/hadoop目录下。

core-site.xml文件如下：

<name>hadoop.tmp.node</name>

<value>/tmp/hadoop/hadoop-${user.name}</value>

</property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

hdfs-site.xml文件如下

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/hadoop_installs/dfs/name</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop_installs/dfs/data</value>

</property>

<name>dfs.replication</name>

</property>

</configuration>

mapred-site.xml文件如下：

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

<name>mapreduce.cluster.local.dir</name>

<value>/home/hadoop/hadoop_installs/mapred/local</value>

</property>

<name>mapreduce.jobtracker.system.dir</name>

<value>/home/hadoop/hadoop_installs/mapred/system</value>

</property>

</configuration>

（5）NameNode格式化并启动HDFS和MapReduce

为了方便hadoop使用，在/etc/profile中PATH变量后添加路径/home/hadoop/ hadoop_installs/hadoop-2.7.3/bin和/home/hadoop/hadoop_installs/hadoop-2.7.3/sbin。使用如下两条命令格式化NameNode和启动HDFS及MapReduce：

hadoop@wup:~$ hadoop namenode –format

hadoop@wup:~$ /home/hadoop/ hadoop_installs/hadoop-2.7.3/sbin/start-all.sh

使用jps命令可以查看Hadoop运行情况：

hadoop@ubuntu:~/hadoop_installs/hadoop-2.7.3$ jps

26384 DataNode

26754 ResourceManager

26260 NameNode

26600 SecondaryNameNode

31950 Jps

26878 NodeManager

表明，Hadoop和HDFS已成功启动。

二、WordCount实验

本章将介绍WordCount实验所用的数据，实验的命令以及实验结果，实验结果包括bash下文件的输出结果以及Hadoop Web的结果截图。

1. 实验数据

实验所使用的数据来源于Google新闻语料，我们把三篇文章合并到了一个txt文档中。

2. 实验过程

实验首先在HDFS上创建输入文件目录test，命令如下：

hadoop@ubuntu:~$ hadoop fs -mkdir test

接着将本地的文件put到HDFS上test目录下，并查看，命令如下：

hadoop@ubuntu:~/txtfile$ hadoop fs -put -f *.txt test

hadoop@ubuntu:~$ hadoop fs -ls test

Found 3 items

-rw-r--r-- 1 hadoop supergroup 24 2016-10-22 21:43 test/file1.txt

-rw-r--r-- 1 hadoop supergroup 24 2016-10-22 21:43 test/file2.txt

-rw-r--r-- 1 hadoop supergroup 114957 2016-10-23 19:57 test/news.txt

注释:-f是可以overwrite的意思

使用InteliJ新建一个Javaproject,编写wordcount程序,如下所示

package example;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.*;

import java.io.IOException;

import java.util.Iterator;

import java.util.StringTokenizer;

/**

* Created by hadoop on 10/22/16.

public class WordCount {

///Mapper: <LongWritable Text> project to <Text IntWritable>

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

///reporter report the progress or they are live

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

//set change string to Text

word.set(tokenizer.nextToken());

output.collect(word, one);

}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

///is called for every key

///IntWritable:you can use it as int

public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

int sum = 0;

while (values.hasNext()) {

sum += values.next().get();

}

output.collect(key, new IntWritable(sum));

}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(WordCount.class);

conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

//conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

//input format

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}

}注释:项目需要导入Hadoop的包,方法:File->Project Structure->Modules->右侧绿色“+”,加入Hadoop目录下lib文件夹

导出jar包,

File->Project Structure->Artifacts->绿色“+”->JAR->From modules with dependcies

->apply

Build->Build Artifacts

运行jar包

hadoop@ubuntu:~/txtfile$ hadoop jar WordCount.jar test test-out

最后查看HDFS上的结果并将结果get到本地：

hadoop@ubuntu:~$ hadoop fs -ls test-out

Found 2 items

-rw-r--r-- 1 hadoop supergroup 0 2016-10-23 00:59 test-out/_SUCCESS

-rw-r--r-- 1 hadoop supergroup 40 2016-10-23 00:59 test-out/part-00000

hadoop@ubuntu:~/exp$ hadoop fs -get /user/hadoop/test_out/part-r-00000 .

3. 实验结果

运行jar包之后,使用浏览器登录http://localhost:8088,点击左侧Node Labels,可以看到运行状态如图1.

图1.作业运行状态

在bash下查看实验输出结果，如图2所示。

图2 bash下实验输出结果

ps:感谢赵师兄的实验报告模板和指导

学海乌鸦

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录