安徽省大数据网络赛大数据分析第二小题

具体数据和字段介绍在此篇博客中:

安徽省大数据分析第一小题

请你将原始数据中用户的"uid","platform","app_version","pid","cityid" 五个字段和期对应的值提取出来。(编写相关代码及部分结果截图7分)

解题思路:
首先我们观察数据的格式,按照什么切分,最终确定按照逗号切分效果最好。

第一步:map阶段进行过滤,只要包含这五个字段就进行写进reduce

核心算法我提取出来:来了一个value我们需要进行字符串转换,并且切割,然后对应的字段的位置是否包含我们想要的数据。
 

String line = value.toString();
//{"common":{"locationcity":0,"uid":"188495963831271424","uaid":"0","platform":"Android","app_version":"1007090002","net":"WIFI","pid":"5057","identifier":"869121033612809","cityid":"2503","iccid":"89860077221897301901","snsid":"","ts":"1557276436920","versionType":"1","pkg":"com.moji.mjweather"}
		//,"event":{"key":"NEWLIVEVIEW_QUIT_TAB","value":"0","du":""}}
		
		//原始数据中用户的"uid","platform","app_version","pid","cityid" 五个字段和期对应的值提取出来。
		
		
		String split[] = line.split(",");
		if(split[1].contains("uid")&&split[3].contains("platform")&&split[4].contains("app_version")&&split[6].contains("pid")&&split[8].contains("cityid")){
			String val = " "+split[3]+" "+split[4]+" "+split[6]+" "+split[8];
			String keys = split[1];
			context.write(new Text(keys),new Text(val));
		}

 

 

package jinsai2;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class MapRe {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf,MapRe.class.getSimpleName());		
		job.setJarByClass(MapRe.class);
		FileInputFormat.setInputPaths(job, new Path(args[0]));		
		job.setMapperClass(MAp.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);

		job.setReducerClass(Red.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		job.waitForCompletion(true);

	}
}

Map函数:
package jinsai2;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class MAp extends Mapper<LongWritable, Text, Text, Text>{
	@Override
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
			throws IOException, InterruptedException {
		//对进来的数据进行切分  hello sxt you very good  2
		String line = value.toString();
//{"common":{"locationcity":0,"uid":"188495963831271424","uaid":"0","platform":"Android","app_version":"1007090002","net":"WIFI","pid":"5057","identifier":"869121033612809","cityid":"2503","iccid":"89860077221897301901","snsid":"","ts":"1557276436920","versionType":"1","pkg":"com.moji.mjweather"}
		//,"event":{"key":"NEWLIVEVIEW_QUIT_TAB","value":"0","du":""}}
		
		//原始数据中用户的"uid","platform","app_version","pid","cityid" 五个字段和期对应的值提取出来。
		
		
		String split[] = line.split(",");
		if(split[1].contains("uid")&&split[3].contains("platform")&&split[4].contains("app_version")&&split[6].contains("pid")&&split[8].contains("cityid")){
			String val = " "+split[3]+" "+split[4]+" "+split[6]+" "+split[8];
			String keys = split[1];
			context.write(new Text(keys),new Text(val));
		}
			
			

	}

}Reduce函数:
package jinsai2;

import java.io.IOException;


import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Red extends Reducer<Text, Text, Text, Text>{
	
	@Override
	protected void reduce(Text key, Iterable<Text> values,
			Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {
		for (Text text : values) {
			context.write(new Text(key),new Text(text));
		}
	}
}

结果截图:

 

 

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值