LInux 网站基于地域维度的UV分析以及eclipse将包打包成jar包

最新推荐文章于 2024-05-06 10:20:34 发布

云雨寒冰

最新推荐文章于 2024-05-06 10:20:34 发布

阅读量185

点赞数

分类专栏： Linux

本文链接：https://blog.csdn.net/zt13258579889/article/details/80100586

版权

Linux 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

PV: Page View 网页浏览量

UV：Unique View 用户唯一访问量

Ip：网络IP地址访问网站的访问量

VV：Visit View 游客的访问数量

回顾：Hadoop 是一个由Apache基金会所开发的分布式系统基础架构，其功能是为了解决大数据的存储和大数据的计算。

hdfs:是Hadoop用来解决大数据存储的方式，又名分布式文件系统

MapReduce：是一种编程模型，用于大规模数据集（大于1TB）的并行运算。学习Hadoop主要学习的是其思想。

hdfs的特点：分布式：

特点：

主从结构：namenode：主节点作用：存储元数据，接受处理用户的请求，管理所有从节点

分块、有副本：保证数据不丢失

Map Reduce V2 分布式计算模型 ——》input——》mapper——》shuffle——》reduce——》output

yarn：分布式

resource manager；作用：资源管理，任务调度，管理从节点

Map Reduce 执行过程：——》input：默认从hdfs中读取数据：Text Input Format / File Input Format ——》路径：Path path = new

Path（args[0]）;——》将每一行转换为一个keyvalue——》输出

mapper——》输入：input的输出 Longwritable /Text ; map():方法，一行调用一次map方法。map思路：1）设计输出，构造输出的keyvalue

2) 对每一行内容进行分割——》输出

shuffle：功能：分区，分组，排序

reduce：每一条keyvalue调用一次reduce方法 reduce，将相同key的List<value>，进行相加求和

output：输出，默认将reduce的输出写入hdfs中。

MapReduce开发模板
-》Driver：
-》不继承不实现
-》继承及实现
extends Configured implements Tool

-》不继承只实现
implements Tool
-》Mapper

-》Reducer

自定义数据类型
-》要求keyvalue中包含多列
-》在map中
输出的key：自定义数据类型
输出的value：nullwritable
-》实现接口
-》Writable
-》write
-》readFields
-》WritableComparable
-》capareTo
-》其他方法
-》get and set
-》构造函数
-》toString

-》hashcode and equals

UV：unique view 唯一访问数，一个用户记一次统计每个城市的UV数

import java.io.IOException;
import java.util.HashSet;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class UrMrtest extends Configured implements Tool {
	
	public static class WebLogUVMapper extends Mapper<LongWritable, Text, Text, Text>{
		private Text outputKey = new Text();
		private Text outputValue = new Text();

		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			//分割每一行的内容
			String line = value.toString();
			String[] items = line.split("\t");
			if (36<=items.length) {
				if (StringUtils.isBlank(items[5])) {
					return;
				}
				
				this.outputKey.set(items[23]);
				this.outputValue.set(items[5]);
				
				context.write(outputKey, outputValue);
			}else{
				return;
			}
		}	
	}
	
	
	public static class WebLogUVReduce extends Reducer<Text, Text, Text, IntWritable>{

		@Override
		protected void reduce(Text key, Iterable<Text> values,Context context)
				throws IOException, InterruptedException {
			// TODO Auto-generated method stub
			HashSet<String> a = new HashSet<>();
			for(Text value:values){
				a.add(value.toString());
			}
			context.write(key, new IntWritable(a.size()));
		}
	}
	
	
	@Override
	public int run(String[] arg0) throws Exception {
		// TODO Auto-generated method stub
		//job
		Job job = Job.getInstance(this.getConf(),"uvtest");
		job.setJarByClass(UrMrtest.class);
		//input
		FileInputFormat.setInputPaths(job, new Path(arg0[0]));
		//map
		job.setMapperClass(WebLogUVMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		//reducer
		job.setReducerClass(WebLogUVReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		
		job.setNumReduceTasks(2);
		//output
		FileOutputFormat.setOutputPath(job, new Path(arg0[1]));
		
		return job.waitForCompletion(true)?0:1;
	}

	public static void main(String[] args){
		Configuration conf = new Configuration();
		
		try {
			int status = ToolRunner.run(conf, new UrMrtest(),args);
			System.exit(status);
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

将代码打包成jar

next——》next

finish 之后

运行OK就好了。

将jar包导入到Linux中，准备好数据

运行检测。

云雨寒冰

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LInux 网站基于地域维度的UV分析以及eclipse将包打包成jar包

PV: Page View 网页浏览量UV：Unique View 用户唯一访问量Ip：网络IP地址访问网站的访问量VV：Visit View 游客的访问数量回顾：Hadoop 是一个由Apache基金会所开发的分布式系统基础架构，其功能是为了解决大数据的存储和大数据的计算。 hdfs:是Hadoop用来解决大数据存储的方式，又名分布式文件系统 Ma...
复制链接

扫一扫