hdfs api读写文写件个人练习

看下hdfs的读写原理,主要是打开FileSystem,获得InputStream or OutputStream;

那么主要用到的FileSystem类是一个实现了文件系统的抽象类,继承来自org.apache.hadoop.conf.Configured,并且实现了Close able接口,可以适用于如本地文件系统file://,ftp,hdfs等多种文件系统,所以呢

若是自己要实现一个系统可以通过继承这个类,做出相应的配置,并且实现相应的抽象方法;

public abstract class FileSystem extends Configured implements Closeable {
      public static FileSystem get(Configuration conf) throws IOException {
        return get(getDefaultUri(conf), conf);
  }
      public static URI getDefaultUri(Configuration conf) {
        return URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)));
  }
      public static FileSystem get(URI uri, Configuration conf) throws IOException {
        String scheme = uri.getScheme();
        String authority = uri.getAuthority();

        if (scheme == null && authority == null) {     // use default FS
              return get(conf);
            }

        if (scheme != null && authority == null) {     // no authority
              URI defaultUri = getDefaultUri(conf);
        if (scheme.equals(defaultUri.getScheme())    // if scheme matches default
              && defaultUri.getAuthority() != null) {  // & default has authority
            return get(defaultUri, conf);              // return default
      }
    }
    
        String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
            if (conf.getBoolean(disableCacheName, false)) {
              return createFileSystem(uri, conf);
    }

        return CACHE.get(uri, conf);
  }
}

从部分源码看下,get方法根据conf获取具体的文件系统对象,,而get(uri,conf)方法基于uri和conf创建文件系统对象;

那么看一个简单的应用,用java的api打开一个文件,并且打印出来

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.Path;
public class OpenMethod {
	public static void main(String args[]) throws Exception{
		Configuration conf = new Configuration();
		conf.set("fs.defaultFs", "hdfs://10.192.4.33:9000/");//配置conf
		FileSystem fs = FileSystem.get(conf);//根据创建FileSystem对象
		Path path = new Path("/home/overshow/hehe.txt");
		FSDataInputStream fis = fs.open(path);
		byte b[] = new byte[200];
		int i = fis.read(b);
		System.out.println(new String(b,0,i));
		
	}

}

这里还有三个需要注意的类,一个是Path,一个是FSDataInputStream,一个是conf

let me show you something

first: Configuration

public class Configuration implements Iterable<Map.Entry<String,String>>,Writable {
  public void set(String name, String value) {
    set(name, value, null);
  }
  
  /** 
   * Set the <code>value</code> of the <code>name</code> property. If 
   * <code>name</code> is deprecated, it also sets the <code>value</code> to
   * the keys that replace the deprecated key. Name will be trimmed before put
   * into configuration.
   *
   * @param name property name.
   * @param value property value.
   * @param source the place that this configuration value came from 
   * (For debugging).
   * @throws IllegalArgumentException when the value or name is null.
   */
}

这个类是作业的配置信息类,通过Configuration可以实现在多个mapper和多个reducer任务之间共享信息,所以任何作用的配置信息必须通过Configuration传递,该类实现了Iterable和Writable两个接口,首先Iterable是迭代出Configuration对象加载到内存中的所有name-value键值对。而Writable是为了实现hadoop框架要求的序列化,可以将内存中的name-value序列化到硬盘;其中的set方法设置Configuration的名称和链接;

而Path类继承了fs类,

public class Path implements Comparable {
  private void checkPathArg( String path ) throws IllegalArgumentException {
    // disallow construction of a Path from an empty string
    if ( path == null ) {
      throw new IllegalArgumentException(
          "Can not create a Path from a null string");
    }
    if( path.length() == 0 ) {
       throw new IllegalArgumentException(
           "Can not create a Path from an empty string");
    }   
  }
  
  /** Construct a path from a String.  Path strings are URIs, but with
   * unescaped elements and some additional normalization. */
  public Path(String pathString) throws IllegalArgumentException {
    checkPathArg( pathString );
    
    // We can't use 'new URI(String)' directly, since it assumes things are
    // escaped, which we don't require of Paths. 
    
    // add a slash in front of paths with Windows drive letters
    if (hasWindowsDrive(pathString) && pathString.charAt(0) != '/') {
      pathString = "/" + pathString;
    }

    // parse uri components
    String scheme = null;
    String authority = null;

    int start = 0;

    // parse uri scheme, if any
    int colon = pathString.indexOf(':');
    int slash = pathString.indexOf('/');
    if ((colon != -1) &&
        ((slash == -1) || (colon < slash))) {     // has a scheme
      scheme = pathString.substring(0, colon);
      start = colon+1;
    }

    // parse uri authority, if any
    if (pathString.startsWith("//", start) &&
        (pathString.length()-start > 2)) {       // has authority
      int nextSlash = pathString.indexOf('/', start+2);
      int authEnd = nextSlash > 0 ? nextSlash : pathString.length();
      authority = pathString.substring(start+2, authEnd);
      start = authEnd;
    }

    // uri path is the rest of the string -- query & fragment not supported
    String path = pathString.substring(start, pathString.length());

    initialize(scheme, authority, path, null);
  }

  /**
   * Construct a path from a URI
   */
  public Path(URI aUri) {
    uri = aUri.normalize();
  }
}

好吧,其实就是设置了hdfs的地址;

最后一个类,FSDataInputStream,额,不想看,太长了,

用fs的open方法创建一个FSDataInputStream类的实例,然后简单来说,读文件的流程就是,客户端到最近的(Namenode说了算)DATa Node上调用FSDataInputStream的read方法,通过反复的调用read方法,将数据从DataNode传递到客户端。

值得一提的是,它创建string的那个构造方法,我找了半天源码,似乎是这个,

    public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }
    /**
     * Constructs a new {@code String} by decoding the specified array of bytes
     * using the platform's default charset.  The length of the new {@code
     * String} is a function of the charset, and hence may not be equal to the
     * length of the byte array.
     *
     * <p> The behavior of this constructor when the given bytes are not valid
     * in the default charset is unspecified.  The {@link
     * java.nio.charset.CharsetDecoder} class should be used when more control
     * over the decoding process is required.
     *
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @since  JDK1.1
    **/

#######################################################################

写文件流程差不多一致,不过用到的是另外一个输出流的类FSDataOutputStream;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;



public class Create_Method {
	public static void main(String args[]) throws Exception{
	Configuration conf = new Configuration();
	conf.set("fs.defaultFS", "hdfs://10.192.4.33:9000/");
	FileSystem fs = FileSystem.get(conf);
	FSDataOutputStream fos = fs.create(new Path("/words.txt"));
	fos.writeChars("hello world");
	
	
	}
}

当然,fs还有一个用处就是查看文件目录,但是注意它的类型,是一个特殊的可迭代对象;

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.RemoteIterator;
public class listStatus {
	public static void main(String args[])throws Exception{
		Configuration conf = new Configuration();
		conf.set("fs.defaultFS", "hdfs://10.192.4.33:9000/data");
		FileSystem fs = FileSystem.get(conf);
		Path path  = new Path("/");
		RemoteIterator<LocatedFileStatus> list = fs.listFiles(path, true);
		while(list.hasNext()) {
			System.out.println(list.next());
		}
	}
}

看下listFiles方法的源码

 public RemoteIterator<LocatedFileStatus> listFiles(
      final Path f, final boolean recursive)
  throws FileNotFoundException, IOException {
    return new RemoteIterator<LocatedFileStatus>() {
      private Stack<RemoteIterator<LocatedFileStatus>> itors = 
        new Stack<RemoteIterator<LocatedFileStatus>>();
      private RemoteIterator<LocatedFileStatus> curItor =
        listLocatedStatus(f);
      private LocatedFileStatus curFile;
     
      @Override
      public boolean hasNext() throws IOException {
        while (curFile == null) {
          if (curItor.hasNext()) {
            handleFileStat(curItor.next());
          } else if (!itors.empty()) {
            curItor = itors.pop();
          } else {
            return false;
          }
        }
        return true;
      }

      /**
       * Process the input stat.
       * If it is a file, return the file stat.
       * If it is a directory, traverse the directory if recursive is true;
       * ignore it if recursive is false.
       * @param stat input status
       * @throws IOException if any IO error occurs
       */

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值