DistributedCache 的使用

1 DistributedCache的用处 

DistributedCache can be used to distribute simple, read-only data/text files and/or more complex types such as archives, jars etc. Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes. Jars may be optionally added to the classpath of the tasks, a rudimentary software distribution mechanism. Files have execution permissions. Optionally users can also direct it to symlink the distributed cache file(s) into the working directory of the task.

cache 的文件不应该被调用的应用程序修改或者外部的其他程序修改,因为DistributedCache会跟踪cache文件的timestamp,修改后可能会引起异常。

2 Example

// Setting up the cache for the application

1. Copy the requisite files to the FileSystem:

$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat

$ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip

$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar

$ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar

$ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz

$ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz

2. Setup the application's JobConf:

JobConf job = new JobConf();

DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), job); //#这个符号表示lookup是/myapp/lookup.dat的别名

DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

 DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);

DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);

DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

3. Use the cached files in the Mapper or Reducer:

public static class MapClass extends MapReduceBase implements Mapper<K, V, K, V> {

           private Path[] localArchives;

           private Path[] localFiles;

           public void configure(JobConf job) {

                        // Get the cached archives/files

                        localArchives = DistributedCache.getLocalCacheArchives(job);

                        localFiles = DistributedCache.getLocalCacheFiles(job);

           }

           public void map(K key, V value, OutputCollector<K, V> output, Reporter reporter) throws IOException {

                       // Use data from the cached archives/files here

                       // ...

                       // ...

                       output.collect(k, v);

           }

}

 

 

URI参数:http://download.oracle.com/javase/6/docs/api/java/net/URI.html?is-external=true

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值