Day6----电商实战项目实现3
问题描述
根据电商日志文件,分析:
1 . 统计页面浏览量(每行记录就是一次浏览)
2 . 统计各个省份的浏览量 (需要解析IP)
3 . 日志的ETL操作(ETL:数据从来源端经过抽取(Extract)、转换(Transform)、加载(Load)至目的端的过程)
为什么要ETL:没有必要解析出所有数据,只需要解析出有价值的字段即可。本项目中需要解析出:ip、url、pageId(topicId对应的页面Id)、country、province、city
(3) 日志的ETL操作(ETL:数据从来源端经过抽取(Extract)、转换(Transform)、加载(Load)至目的端的过程)
- 将日志文件导入虚拟机中后,启动集群,利用idea编写代码,map和reduce具体关键代码如下:
public class ETLApp {
public static void main(String[] args) throws Exception{
Configuration configuration = new Configuration();
FileSystem fileSystem = FileSystem.get(configuration);
Path outputPath = new Path(args[1]);
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
Job job = Job.getInstance(configuration);
job.setJarByClass(ETLApp.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(NullWritable.class);
job.setMapOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
static class MyMapper extends Mapper<LongWritable, Text, NullWritable, Text> {
private LongWritable ONE = new LongWritable(1);
private LogParser logParser;
@Override
protected void setup(Context context) throws IOException, InterruptedException {
logParser = new LogParser();
}
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String log = value.toString();
Map<String, String> info = logParser.parseV2(log);
String ip=info.get("ip");
String country=info.get("country");
String province=info.get("province");
String city=info.get("city");
String url=info.get("url");
String time=info.get("time");
String pageId=ContentUtils.getPageId(url);
StringBuilder builder=new StringBuilder();
builder.append(ip).append("\t");
builder.append(country).append("\t");
builder.append(province).append("\t");
builder.append(city).append("\t");
builder.append(url).append("\t");
builder.append(time).append("\t");
builder.append(pageId);
context.write(NullWritable.get(),new Text(builder.toString()));
}
}
}
- 将编写好的程序打成jar包后在Hadoop中运行jar包,在网页中查看运行结果,结果: