MapReduce案例2_Countbydate

土味霸总

已于 2022-11-23 15:24:51 修改

阅读量172

点赞数

分类专栏：云计算与大数据学习 Hadoop 文章标签： mapreduce hadoop 大数据 intellij idea hdfs

于 2022-11-23 15:24:43 首次发布

本文链接：https://blog.csdn.net/tuweibazong/article/details/127838740

版权

云计算与大数据学习同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

Hadoop

11 篇文章 0 订阅

订阅专栏

从日志文件进行单词计数：

首先，使用JAVA IDEA软件新建项目CountByData，并利用该软件编译并自动生成jar包：

然后在项目中添加如下代码段：

<dependencies>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.1.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>3.1.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-client-core</artifactId>
        <version>3.1.4</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-examples -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-mapreduce-examples</artifactId>
        <version>3.1.4</version>
        <scope>test</scope>
    </dependency>
</dependencies>

确定项目下有src\main\java文件如图：

如果没有选择file---new---directory新建：

确保有了java之后在src\main\java下新建包demo:

然后在demo下新建class类：

然后向class文件中加入代码：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;
import java.util.StringTokenizer;

public class CountByDate {
    public static class SplitMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
//value email | date
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            String data[] = value.toString().split("\\|", -1);
            word.set(data[1]);
            context.write(word, one);
            }
        }
    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(CountByDate.class);
        job.setMapperClass(SplitMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
                new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

结果如图：

然后点击ok即可。

会有一个小的弹窗出来，点击built即可：

然后下方的输出台会有变化，并且出现了out的文件夹，自动生成了jar包：

然后使用xftp将jar包以及所需的txt文件传输到虚拟机中：

这个txt文件我放到百度网盘中：

链接：https://pan.baidu.com/s/1S7de2iWSJPZh-_4k3HxZSA
提取码：ccjy

这两个文件我是想要传输到root/wc00中的，由于开始传输的位置不对，我把它挪了一下，挪到了wc00中：

然后在xshell中进行操作：

先打开Hadoop集群：

cd /opt/hadoop-3.1.4/sbin
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
jps

然后要先cd进入根目录再进入wc00目录中，然后ls查看刚才上传的文件：

再打开50070的页面，发现没有刚才我们上传的文件，所以接下来就需要将文件上传到Hadoop分布式文件系统中，并编译文件：

hdfs dfs -put email_log_with_date.txt /input
yarn jar CountByDate.jar demo.CountByDate /input /resultsYarn

然后可以在50070的页面中看到新增的三个文件夹以及运行的结果：

现在我们需要从日志文件中按日期统计email的数量，我们在写Mapreduce的时候可以参考他的一些官方文档，我们可以从MVN仓库中搜索Mapreduce-example的一些东西，网站地址为：

https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-examples/3.1.4

下面的查找不用实现：

复制依赖，添加到pom.xml文件中：

如图，有mapreduce-examples的依赖出现，但其实我最开始复制的源代码过来时是有这个依赖的，所以不用进行查找粘贴依赖这一步：

然后在xshell中配置时间同步服务：

（1）先使用命令安装ntp，再进入ntp文件中进行配置：

yum -y install ntp
vi /etc/ntp.conf

（2）对主节点master进行配置的命令，在输入下面的命令之前要先将4个server注释掉：

restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
server 127.127.1.0
fudge 127.127.1.0 stratum 10

修改过后的主节点master中的结果如图：

（3）对3个克隆机node1、2、3也要进行相同的操作，依旧要先安装ntp然后进入ntp文件进行配置。配置命令与主机略有不同，但依旧要注释掉前面的4个server：

restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
server 127.127.1.0
fudge 127.127.1.0 stratum 10

server master

（4）因为之前关闭掉了主机和3个克隆机的防火墙，所以这里就不需要再次关闭防火墙了，只需要查看一下防火墙的状态就好，也是主机master和3个克隆机node1、2、3都要查看：

systemctl status firewalld.service

（5）最后就是启动ntp服务：

在主机master上执行代码：

service ntpd start & chkconfig ntpd on

在3个克隆机node1、2、3执行代码，其中第一句是同步时间，第二句是启动并永久启动ntp服务：

ntpdate master
service ntpd start & chkconfig ntpd on

土味霸总

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
MapReduce案例2_Countbydate

利用idea、xshell、centos7、Hadoop集群、以及Mapreduce进行日志文件的单词计数
复制链接

扫一扫

专栏目录