1、修改hdfs-site.xml
2、目前如何在命令行里面给HDFS文件中追加内容我还没找到相应的方法。但是,我们可以通过Hadoop提供的API实现文件内容追加,如何实现?这里我写了一个简单的测试程序:
3、将代码打包运行
4、如果报错:
5、再次修改hdfs-site.xml
6:
- <property>
- lt;name>dfs.support.append</name>
- <value>true</value>
- operty>
2、目前如何在命令行里面给HDFS文件中追加内容我还没找到相应的方法。但是,我们可以通过Hadoop提供的API实现文件内容追加,如何实现?这里我写了一个简单的测试程序:
- public class AppendContent {
- public static void main(String[] args) {
- String hdfs_path = "/sort/sort";//文件路径
- Configuration conf = new Configuration();
- FileSystem fs = null;
- try {
- fs = FileSystem.get(URI.create(hdfs_path), conf);
- //要追加的文件流,inpath为文件
- OutputStream out = fs.append(new Path(hdfs_path));
- Writer writer = new OutputStreamWriter(out);
- BufferedWriter bfWriter = new BufferedWriter(writer);
- bfWriter.write("good!!");
- if(null != bfWriter){
- bfWriter.close();
- }
- if(null != writer){
- writer.close();
- }
- if(null != out){
- out.close();
- }
- System.out.println("success!!!");
- } catch (IOException e) {
- e.printStackTrace();
- }
- }
- }
3、将代码打包运行
4、如果报错:
- Exception in thread "main" java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[10.10.22.17:50010, 10.10.22.18:50010], original=[10.10.22.17:50010, 10.10.22.18:50010]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:960)
- at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1026)
- at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1175)
- at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:531)
5、再次修改hdfs-site.xml
- <property>
- <name>dfs.client.block.write.replace-datanode-on-failure.policy</nam
- e>
- <value>NEVER</value>
- </property>
- <property>
- <name>dfs.client.block.write.replace-datanode-on-failure.enable</nam
- e>
- <value>true</value>
- </property>
6:
- dfs.client.block.write.replace-datanode-on-failure.enable=true
- 在进行pipeline写数据(上传数据的方式)时,如果DN或者磁盘故障,客户端将尝试移除失败的DN,然后写到剩下的磁盘。一个结果是,pipeline中的DN减少了。这个特性是添加新的DN到pipeline。这是一个站点范围的选项。当集群规模非常小时,例如3个或者更小,集群管理者可能想要禁止掉此特性。
- dfs.client.block.write.replace-datanode-on-failure.policy=DEFAULT
- 此属性仅在dfs.client.block.write.replace-datanode-on-failure.enable设置为true时有效。
- ALWAYS: 总是添加新的DN
- NEVER: 从不添加新的DN
- DEFAULT: 设r是副本数,n是要写的DN数。在r>=3并且floor(r/2)>=n或者r>n(前提是文件是hflushed/appended)时添加新的DN。
原创:http://crazymatrix.iteye.com/blog/2228496