distcp(分布式复制)的有用程序,能从hadoop的文件系统并行复制大量数据,distcp一般用于在两个HDFS集群中传输数据.distcp命令支持如下命令行参数:
当前只支持短参数,不支持长参数。短参数指参数前用“-”连接,长参数指参数前用“--”连接。
[ Options: [ short {短参数定义
update=[ option: update :: Update target, copying only missingfiles or directories ],拷贝目标中不一致的文件或目录,如果已经存在一致的则不覆盖
f=[ option: f [ARG] :: List of files that need to be copied ], 拷贝列表文件中的文件到目标文件,列表文件中每条记录都是完整的文件路径
mapredSslConf=[ option: mapredSslConf [ARG] :: Configuration for ssl config file, to use with hftps:// ],
strategy=[ option: strategy [ARG] :: Copy strategy to use. Default is dividing work based on file sizes ], 文件复制分割大小的策略,默认是uniform(一样大),其他如dynamic
skipcrccheck=[ option: skipcrccheck :: Whether to skip CRC checks between source and target paths. ],跳过crc校验
m=[ option: m [ARG] :: Max number of concurrent maps to use for copy ], 指定最大map任务数量,distcp拷贝文件使用map任务方式执行
log=[ option: log [ARG] :: Folder on DFS where distcp execution logs are saved ], 执行日志记录目录
async=[ option: async :: Should distcp execution be blocking ],是否同步执行,如果同步则阻塞执行
bandwidth=[ option: bandwidth [ARG] :: Specify bandwidth per map in MB ], 指定map任务执行copy操作的带宽,单位MB
i=[ option: i :: Ignore failures during copy ], 忽略失败的copy任务,执行下一个copy任务
atomic=[ option: atomic :: Commit all changes or none ], 是否copy操作是原子的,即全部成功才成功,如果有一个失败则全部失败。
overwrite=[ option: overwrite :: Choose to overwrite target files unconditionally, even if they exist. ], 指定覆盖的目标文件
p=[ option: p [ARG] :: preserve status (rbugp)(replication, block-size, user, group, permission) ], 预留状态
delete=[ option: delete :: Delete from target, files missing in source ], 删除目标文件目录中源目录没有的文件
tmp=[ option: tmp [ARG] :: Intermediate work path to be used for atomic commit ]} ] 为原子任务创建的临时目录
[ long {} ]长参数,暂时没有