hdfs-整合nfs 官文翻译

最新推荐文章于 2024-05-14 04:11:10 发布

zdkdchao

最新推荐文章于 2024-05-14 04:11:10 发布

阅读量348

点赞数

分类专栏： hdfs

本文链接：https://blog.csdn.net/qq_34224565/article/details/109572217

版权

hdfs 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

nfs服务可以让远程客户端以访问本地文件的形式访问远程hdfs目录
apache官文
 cdh官文

overview

nfs gateway支持NFSv3，将hdfs目录挂载到远程客户端的本地文件系统中。主要有以下场景：

使用NFSv3 client浏览hdfs目录
下载
上传
用户可以通过挂载点直接将数据流传输到HDFS。支持文件追加，但不支持随机写
The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory. The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
nfs gateway节点需要hdfs client相同的文件和配置，包括hadoop jar、HADOOP_CONF配置文件。如果2者在同一节点也可以，包括DN、NN或者有hdfs 客户端的节点。

Configuration

nfs-gateway使用代理用户代理所有访问NFS挂载的用户。在非安全模式，使用代理用户代理真正用户，安全模式下用kerberos认证的用户。假设代理用户是nfsserver，真实用户属于组users-group1和users-group2，使用nfs挂载。在NN的core-site.xm中，以下2个配置必须配置，只有NN需要在配置修改后重启。

<property>
  <name>hadoop.proxyuser.nfsserver.groups</name>
  <value>root,users-group1,users-group2</value>
  <description>
         The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and 
         'users-group2' groups. Note that in most cases you will need to include the
         group "root" because the user "root" (which usually belonges to "root" group) will
         generally be the user that initially executes the mount on the NFS client system. 
         Set this to '*' to allow nfsserver user to proxy any group.
  </description>
</property>
<property>
  <name>hadoop.proxyuser.nfsserver.hosts</name>
  <value>nfs-client-host1.com</value>
  <description>
         This is the host where the nfs gateway is running. Set this to '*' to allow
         requests from any hosts to be proxied.
  </description>
</property>

The above are the only required configuration for the NFS gateway in non-secure mode. For Kerberized hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace string “nfsserver” with the proxy user name and ensure the user contained in the keytab is also the same proxy user):
以上是nfs gateway 在非安全模式下的唯一配置。如果是安全模式，以下的配置需要加到gateway节点的hdfs-site.xml中（注意：keytab中的用户和代理用户要一致）

<property>
    <name>nfs.keytab.file</name>
    <value>/etc/hadoop/conf/nfsserver.keytab</value> <!-- path to the nfs gateway keytab -->
  </property>
  <property>
    <name>nfs.kerberos.principal</name>
    <value>nfsserver/_HOST@YOUR-REALM.COM</value>
  </property>

对于安全模式和非安全模式，NFS网关的其余配置都是可选的。
AIX NFS客户端有些issues在默认情况下会阻碍同NFS gateway的正常工作。如果要从AIX访问HDFS NFS Gateway，使用以下配置：

<property>
  <name>nfs.aix.compatibility.mode.enabled</name>
  <value>true</value>
</property>

Note that regular, non-AIX clients should NOT enable AIX compatibility mode. The work-arounds implemented by AIX compatibility mode effectively disable safeguards to ensure that listing of directory contents via NFS returns consistent results, and that all data sent to the NFS server can be assured to have been committed.
注意：常规的、非AIX客户端不应该启动AIX兼容模式，通过AIX兼容模式实现的应急方案禁用了保护措施，以确保通过NFS列出的目录内容返回一致的结果，并且确保所有发送到NFS server的数据可以提交。
强烈建议用户根据他们的用例更新一些配置属性。可以在hdfs-site.xml中添加或更新以下所有配置属性。

如果客户端的挂载导出允许访问时间更新，确保配置文件中没有禁用以下属性。更改此属性后，只有NameNode需要重新启动。在一些Unix系统上，用户可以通过使用“noatime”挂载导出来禁用访问时间更新。如果导出用“noatime”挂载，用户不需要更改以下属性，因此不需要重新启动namenode。

<property>
  <name>dfs.namenode.accesstime.precision</name>
  <value>3600000</value>
  <description>The access time for HDFS file is precise upto this value.
    The default value is 1 hour. Setting a value of 0 disables
    access times for HDFS.
  </description>
</property>

用户需要更新文件转储目录。NFS客户端经常重整写道，顺序写可以以随机顺序到达NFS Gateway。这个目录用于在写入HDFS之前临时保存无序写入。对于每个文件，无序写操作在内存中累积超过一定的阈值(例如1MB)后被转储。我们需要确保目录有足够的空间。例如，如果应用程序上传10个文件，每个文件有100MB的空间，建议这个目录有大约1GB的空间，以防每个文件都发生最坏的写重新排序。更新此属性后，只有NFS gateway需要重新启动。

<property>    
  <name>nfs.dump.dir</name>
  <value>/tmp/.hdfs-nfs</value>
</property>

默认情况下，导出可以由任何客户机挂载。为了更好地控制访问，用户可以更新以下属性。value标签的值包含机器名称和访问权限，用空格字符分隔。机器名称格式可以是单个主机、Java正则表达式或IPv4地址。访问特权使用rw或ro来指定对导出的机器的读/写或只读访问。如果未提供访问特权，则默认为只读。条目之间用";"隔开。例如:"192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;"更新此属性后，只有NFS网关需要重新启动。

<property>
  <name>nfs.exports.allowed.hosts</name>
  <value>* rw</value>
</property>

JVM和日志设置。您可以在HADOOP_NFS3_OPTS中导出JVM设置(如堆大小和GC日志)。更多NFS相关设置可以在hadoop-env.sh中找到。要获得NFS debug trace，可以编辑log4j。配置文件添加以下内容。注意，debug trace，特别是ONCRPC，可能会非常冗长。
改变日志级别：log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
获取更多ONCRPC 请求的信息:log4j.logger.org.apache.hadoop.oncrpc=DEBUG

启停NFS gateway服务

提供NFS服务需要三个守护进程:rpcbind(或portmap)、mountd和nfsd。NFS gateway 进程同时具有nfsd和mountd。它共享HDFS根目录“/”作为唯一的导出。建议使用NFS gateway 包中包含的portmap。尽管NFS gateway可以与大多数Linux发行版提供的portmap/rpcbind一起工作。但是在一些Linux系统(如REHL6.2)上，由于rpcbind的缺陷，需要包含portmap的包。

1. Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):

service nfs stop
service rpcbind stop

2. Start package included portmap (needs root privileges):

hdfs portmap
  
OR

hadoop-daemon.sh start portmap

3. Start mountd and nfsd.

此命令不需要root权限。在非安全模式下，NFS Gateway应该由上文开始部分提到的代理用户启动。在安全模式下，只要用户具有对“nfs.keytab.file”配置项中定义的Kerberos keytab的读访问权，就可以启动NFS Gateway。

hdfs nfs3

OR

hadoop-daemon.sh start nfs3

注意：如果hadoop-daemon.sh脚本启动NFS网关，那么它的日志可以在hadoop日志文件夹中找到。

4. Stop NFS gateway services.

hadoop-daemon.sh stop nfs3

hadoop-daemon.sh stop portmap

如果您以root用户启动NFS Gateway，您可以选择放弃运行hadoop提供的portmap守护进程，而在所有操作系统上使用系统portmap守护进程代替。这将允许HDFS NFS Gateway处理上述错误，并且仍然使用系统portmap守护进程注册。要做到这一点，只需像往常一样启动NFS Gateway守护进程，但是要确保以“root”用户这样做，还要将“HADOOP_PRIVILEGED_NFS_USER”环境变量设置为非特权用户。在这种模式下NFS Gateway将开始作为root用户执行portmap首次登记系统,然后将下降特权用户指定的HADOOP_PRIVILEGED_NFS_USER之后,剩下的NFS网关的生命周期过程的持续时间。注意，如果您选择了这条路线，您应该跳过上面的步骤1和步骤2。

zdkdchao

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
hdfs-整合nfs 官文翻译

nfs服务可以让远程客户端以访问本地文件的形式访问远程hdfs目录官文overviewnfs gateway支持NFSv3，将hdfs目录挂载到远程客户端的本地文件系统中。主要有以下场景：使用NFSv3 client浏览hdfs目录下载上传用户可以通过挂载点直接将数据流传输到HDFS。支持文件追加，但不支持随机写The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files,
复制链接

扫一扫