3.马士兵_hadoop入门

最新推荐文章于 2024-07-09 16:17:35 发布

weixin_34238642

最新推荐文章于 2024-07-09 16:17:35 发布

阅读量97

点赞数

文章标签： java 大数据 python

原文链接：https://my.oschina.net/u/1052786/blog/880421

版权

2019独角兽企业重金招聘Python工程师标准>>>

1.存储模块（hadoop），资源调度模块(yarn)，计算引擎(mapreduce)

2.hdfs,看成一个由很多机器组成的大硬盘。支持动态扩展，动态增减。配置core-site.xml;slave文件记录了管理那些datanode，namenode可以集中管理。

3.本次是用java程序访问hdf，360，百度网盘。

4.如果机器跑不了那么多，就用伪分布式结构。

5. jps,hdfs dfsadmin -report，可以查看集群的启动情况。

6.hadoop默认存的路径是/tmp，如果没有修改过的话，linux重启不定时的会清除这个目录。有可能造成不正常，所以要进行一定的修改。hdfs namenode -formate。start[stop]-dfs.sh。

7.用程序访问hdfs。

a.URL.

b.获得内容的简单方法：

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
       URL url = new URL("hdfs://192.168.56.100:9000/hello.txt");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);

c.创建写入过程可能有用户权限问题。是因为：vi hdfs-site.xml中配置了权限的检查，内容，关闭检查：

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>修改后只要重启namenode即可，如果重启集群，那生产环境的代价就太大了。另外说一句，delete文件默认只是放入垃圾堆中。
代码示例，基本的核心代码原理基本的baidu网盘就实现了：

import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URL;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HelloHDFS {

   public static void main(String[] args) throws Exception {
       /*URL url = new URL("http://www.baidu.com");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);*/
       /*URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
       URL url = new URL("hdfs://192.168.56.100:9000/hello.txt");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);*/
       Configuration conf = new Configuration();
       conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000");
       FileSystem fileSys = FileSystem.get(conf);
       //这里能执行很多常规增删改查功能

       boolean suc = fileSys.mkdirs(new Path("/gxl"));
       System.out.println(suc);

       suc = fileSys.exists(new Path("/gxl"));
       System.out.println(suc);

//       suc = fileSys.delete(new Path("/gxl"),true);
//       System.out.println(suc);

       suc = fileSys.exists(new Path("/gxl"));
       System.out.println(suc);

       //上传windows文件
       FSDataOutputStream out = fileSys.create(new Path("/test.data"), true);
       FileInputStream in = new FileInputStream("F:/BaiduNetdiskDownload/Xftp.exe");
//       IOUtils.copyBytes(in, out, 4096,true);

       byte[] buf = new byte[4096];
       int len = in.read(buf);
       while(len != -1){
           out.write(buf,0,len);
           len = in.read(buf);
       }
       in.close();
       out.close();

       //读信息
       FileStatus[] fstatus = fileSys.listStatus(new Path("/"));
       for(FileStatus status:fstatus){
           System.out.println(status.getPath());
           System.out.println(status.getPermission());
           System.out.println(status.getReplication());
       }
   }
}
8.其实java写hdfs还是比较简单的，只不过实际中用的很少，有了hive，pig后，mapreduce也很少了。