一个Job里可以从多个同质或异质的输入源读取数据,并使用各自的Mapper
MultiOutputFormat可以让你按一定规则指定、分隔reduce output的文件名,如
- MultipleInputs.addInputPath(conf, ncdcInputPath,
- TextInputFormat.class, MaxTemperatureMapper.class)
- MultipleInputs.addInputPath(conf, metOfficeInputPath,
- TextInputFormat.class, MetOfficeMaxTemperatureMapper.class);
MultiOutputFormat可以让你按一定规则指定、分隔reduce output的文件名,如
- ...
- static class StationNameMultipleTextOutputFormat
- extends MultipleTextOutputFormat<NullWritable, Text> {
- private NcdcRecordParser parser = new NcdcRecordParser();
- protected String generateFileNameForKeyValue(NullWritable key, Text value,
- String name) {
- parser.parse(value);
- return parser.getStationId();
- }
- }
- ...