大家好,我是曜耀。
今天发一篇,关于大数据清洗数据的java代码
public class accesstMapper extends Mapper<LongWritable, Text, Text, NullWritable>{
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, NullWritable>.Context context)
throws IOException, InterruptedException {
String line=value.toString();
String str[]=line.split(" ");
String riji=str[0]+","+str[8]+","+str[9];
context.write(new Text(riji), NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf=new Configuration();
FileSystem fs=FileSystem.get(conf);
if (fs.exists(new Path(args[1]))) {
fs.delete(new Path(args[1]), true);
}
Job job=Job.getInstance();
job.setJarByClass(accesstSubmitter.class);
job.setMapperClass(accesstMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
这就是大数据的数据清洗最简单的公式代码,不用理解,只需记住遇题80%能解决。
我是曜耀,下次见。