Hadoop: the definitive guide 第三版拾遗第四章之MapFile

本文转自:http://blog.csdn.net/hadoop_

MapFile即是已经排好序的SequenceFile，已加入用于搜索键的索引。

MapFile是排序后的SequenceFile,通过观察其目录结构可以看到MapFile由两部分组成，分别是data和index。即：

集群上文件:“hdfs://master:9000/mapfile/numbers.map/data”

“hdfs://master:9000/mapfile/numbers.map/index”

data即数据文件中包含所有需要存储的key-value对，按key的顺序排列。

index作为文件的数据索引，主要记录了每个Record的key值，以及该Record在文件中的偏移位置。在MapFile被访问的时候,索引文件会被加载到内存，通过索引映射关系可迅速定位到指定Record所在文件位置，因此，相对SequenceFile而言，MapFile的检索效率是高效的，缺点是会消耗一部分内存来存储index数据。
需注意的是，MapFile并不会把所有Record都记录到index中去，默认情况下每隔128条记录存储一个索引映射。当然，记录间隔可人为修改，通过MapFIle.Writer的setIndexInterval()方法，或修改io.map.index.interval属性；
另外，与SequenceFile不同的是，MapFile的KeyClass一定要实现WritableComparable接口,即Key值是可比较的。

下面给出MapFile的重建索引代码：

[java]view plaincopy 
   
 package com.tht.hadoopIO;  
   
 //cc MapFileFixer Re-creates the index for a MapFile  
 import java.net.URI;  
   
 import org.apache.hadoop.conf.Configuration;  
 import org.apache.hadoop.fs.FileSystem;  
 import org.apache.hadoop.fs.Path;  
 import org.apache.hadoop.io.MapFile;  
 import org.apache.hadoop.io.SequenceFile;  
   
 //vv MapFileFixer  
 public class MapFileFixer {  
   
     public static void main(String[] args) throws Exception {  
 //      String mapUri = args[0];  
         String mapUri = "hdfs://master:9000/mapfile/numbers.map";  
           
         Configuration conf = new Configuration();  
   
         FileSystem fs = FileSystem.get(URI.create(mapUri), conf);  
         Path map = new Path(mapUri);  
         Path mapData = new Path(map, MapFile.DATA_FILE_NAME);  
   
         // Get key and value types from data sequence file  
         SequenceFile.Reader reader = new SequenceFile.Reader(fs, mapData, conf);  
         Class keyClass = reader.getKeyClass();  
         Class valueClass = reader.getValueClass();  
         reader.close();  
   
         // Create the map file index file  
         long entries = MapFile.fix(fs, map, keyClass, valueClass, false, conf);  
         System.out.printf("Created MapFile %s with %d entries\n", map, entries);  
     }  
 }  
 // ^^ MapFileFixer