Day4----电商实战项目实现

最新推荐文章于 2024-07-27 13:31:57 发布

Upsy Daisy z

最新推荐文章于 2024-07-27 13:31:57 发布

阅读量371

点赞数 11

文章标签： java 大数据

本文链接：https://blog.csdn.net/m0_66098020/article/details/139611272

版权

Day4----电商实战项目实现

问题描述

根据电商日志文件，分析：
1 . 统计页面浏览量（每行记录就是一次浏览）
2 . 统计各个省份的浏览量（需要解析IP）
3 . 日志的ETL操作（ETL：数据从来源端经过抽取（Extract）、转换（Transform）、加载（Load）至目的端的过程）
为什么要ETL：没有必要解析出所有数据，只需要解析出有价值的字段即可。本项目中需要解析出：ip、url、pageId（topicId对应的页面Id）、country、province、city

（1）统计页面浏览量

将日志文件导入虚拟机中后，启动集群，利用idea编写代码，map和reduce具体关键代码如下：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * 浏览量的统计
 */
public class PVStatApp {
    public static void main(String[] args) throws Exception{
// driver类 八股文
        Configuration configuration =new Configuration();

        FileSystem fileSystem=FileSystem.get(configuration);
        Path outputPath=new Path(args[1]);
        if(fileSystem.exists(outputPath)){
            fileSystem.delete(outputPath,true);
        }


        Job job =Job.getInstance(configuration);
        job.setJarByClass(PVStatApp.class);

        job.setMapperClass(Mymapper.class);
        job.setReducerClass(MyReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(LongWritable.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        job.waitForCompletion(true);


    }
//Map
    static class Mymapper extends Mapper<LongWritable, Text, Text, LongWritable> {

        private  Text KEY=new Text("key");
        private  LongWritable ONE=new LongWritable(1);
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            context.write(KEY,ONE);
        }
    }
    //Reduce
    static class MyReducer extends Reducer<Text,LongWritable, NullWritable,LongWritable>{
        @Override
        protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
            long count =0;
            for(LongWritable value :values){
                count++;
            }
            context.write(NullWritable.get(),new LongWritable(count));
        }
    }
}

部分代码如下：
在这里插入图片描述

将编写好的程序打成jar包后在Hadoop中运行jar包，在网页中查看运行结果，结果为 line 300000

Upsy Daisy z

关注

11
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Day4----电商实战项目实现

根据电商日志文件，分析：1 . 统计页面浏览量（每行记录就是一次浏览）2 . 统计各个省份的浏览量（需要解析IP）3 . 日志的ETL操作（ETL：数据从来源端经过抽取（Extract）、转换（Transform）、加载（Load）至目的端的过程）为什么要ETL：没有必要解析出所有数据，只需要解析出有价值的字段即可。本项目中需要解析出：ip、url、pageId（topicId对应的页面Id）、country、province、city。
复制链接

扫一扫