hadoop/hdfs QJM 笔记

 在hadoop hdfs 中有一个角色叫做 JournalNode, 作用是储存对hdfs修改的日志,如果NN挂掉,通过重新播放日志来恢复。

在原来的模式下,这些日志文件都是放在active的NameNode(NN) 中,又starndy 的NN 定期来合并这些日志文件(压缩等待),然后将合并后的文件合并到active 的NN,关键问题是,如果这个NN挂掉了那么整个集群就挂掉了,为了解决这个问题.

1. 将日志文件由NN写到几台机器上journalnode,几台的原因是担心有机器挂掉.

2. 设置两个NN,一个是active的,一个是backup的,如果active的挂掉了,通过zookeeper,让backup的NN变成active的NN,通过重播journalnode上的日志,让backup的NN的数据同步到active NN 挂掉的状态, 这样集群的容错能力更强。

下面是原文.

Background

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This impacted the total availability of the HDFS cluster in two major ways:

  • In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.

  • Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.


从NN 的角度

1. 现在有一台NN, 有多台journalNode,NN要往 journalndoe 写数据,应该有一个类来干这件事情.就是下图的QutomJournalManager.

看到AsyncLoggerSet 这个就是与多台Journalnode 通信的set,实际保存的是由ICPLoggerChannel.Factory 创造的AsyncLogger.如下图



2.既然现在有了与多个journalnode通信的管理类,NN只需要调用 QuorumJournalManager 的api就可以了,接下来看一下这些Api是怎么实现的, 拿doFinalize() 方法来实现。

a. QuorumJournalManager 方法


b. 接着看loggers 具体实现,针对每一个AsyncLogger 都会返回一个ListenableFuture。(AsyncLoggerSet)


(IPCLoggerChannel)


看到这里就没有看了,估计后面也是RPC 的一些东西。

现在解析图b 中的 QuorumCall.create(calls),记录那些与不同的Journalnode 通信的结果.(AsyncLoggerSet)


(QuorumCall)


最后通过与与不同的journalnode的通信结果来处理(QuorumJournalManager)



前面都是NN端的操作,接下来就是Journalnode接收端的具体操作了,主要是类JournalNode,是对本地log操作的一层简单的封装,有相应的操作,如上图的doFinalize(String journalId)




参考链接:https://blog.csdn.net/androidlushangderen/article/details/48415073 

root@job-da8abcdd-9948-4878-9d20-371dceb00ee1-master-0:/home# start-dfs.sh Starting namenodes on [master] /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh. Starting datanodes /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh. Starting secondary namenodes [job-da8abcdd-9948-4878-9d20-371dceb00ee1-master-0] /opt/hadoop/hadoop/bin/hdfs: 26: function: not found /opt/hadoop/hadoop/bin/hdfs: 28: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 29: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 30: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 31: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 32: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 33: hadoop_add_option: not found /opt/hadoop/hadoop/bin/hdfs: 35: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 36: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 37: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 38: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 39: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 40: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 41: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 42: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 43: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 44: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 45: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 46: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 47: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 48: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 49: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 50: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 51: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 52: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 53: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 54: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 55: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 56: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 57: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 58: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 59: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 60: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 61: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 62: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 63: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 64: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 65: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 66: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 67: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 68: hadoop_add_subcommand: not found /opt/hadoop/hadoop/bin/hdfs: 69: hadoop_generate_usage: not found /opt/hadoop/hadoop/bin/hdfs: 77: function: not found /opt/hadoop/hadoop/bin/hdfs: 218: hadoop_validate_classname: not found /opt/hadoop/hadoop/bin/hdfs: 219: hadoop_exit_with_usage: not found /opt/hadoop/hadoop/bin/hdfs: 226: [[: not found /opt/hadoop/hadoop/bin/hdfs: 235: [[: not found ERROR: Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh.
07-11
<think>我们正在处理一个Hadoop启动HDFS服务时出现的错误。用户执行start-dfs.sh脚本时遇到了两个主要问题: 1. 提示各种函数未找到(function not found) 2. 无法执行hdfs-config.sh脚本(Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh) 根据经验,这类问题通常与脚本的执行环境有关,比如脚本的权限问题、脚本的编码格式(如DOS格式与Unix格式的差异)或者环境变量配置问题。 首先,我们检查脚本的权限。如果脚本没有可执行权限,那么就会出现无法执行的错误。我们可以使用`ls -l`命令检查权限,并使用`chmod`命令添加可执行权限。 其次,检查脚本的编码格式。如果脚本是在Windows环境下编辑过,可能会包含DOS格式的换行符(CRLF)而不是Unix格式的换行符(LF),这可能导致解释器无法正确识别。我们可以使用`dos2unix`工具转换格式。 另外,函数未定义的错误可能是由于环境变量配置问题或者脚本中函数定义未被正确加载。我们需要检查hdfs-config.sh脚本中是否正确定义了这些函数,以及是否在调用前被正确加载。 具体步骤: 1. 检查并修复脚本权限: ```bash # 检查权限 ls -l /opt/hadoop/hadoop/libexec/hdfs-config.sh # 如果没有可执行权限,添加权限 chmod +x /opt/hadoop/hadoop/libexec/hdfs-config.sh ``` 2. 检查并转换脚本格式: ```bash # 检查文件类型(查看是否有CRLF) file /opt/hadoop/hadoop/libexec/hdfs-config.sh # 或者使用cat -v查看是否有^M cat -v /opt/hadoop/hadoop/libexec/hdfs-config.sh | head # 安装dos2unix(如果未安装) sudo apt-get install dos2unix # 对于Debian/Ubuntu # 转换文件 dos2unix /opt/hadoop/hadoop/libexec/hdfs-config.sh ``` 3. 检查脚本中的函数定义: 打开hdfs-config.sh,查看其中是否定义了缺失的函数(比如`hdfs_java_setup`等)。通常,这些函数应该在脚本中定义,或者通过`source`引入其他文件。检查脚本开头是否有引入其他配置文件的语句,例如: ```bash # 通常会有类似这样的语句 if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi ``` 确保这些配置文件存在且路径正确。 4. 检查环境变量: 确保`HADOOP_CONF_DIR`环境变量已正确设置,指向Hadoop的配置目录(通常为`/opt/hadoop/hadoop/etc/hadoop`)。可以在执行start-dfs.sh之前设置环境变量,或者检查`hadoop-env.sh`等配置文件。 5. 执行start-dfs.sh时添加调试信息: 在start-dfs.sh脚本的开头添加`set -x`,这样可以看到执行过程的详细输出,有助于定位问题。 如果以上步骤不能解决问题,可能需要查看具体的错误信息,并进一步分析。 由于用户的问题描述中提到了“各种函数未找到”,因此我们需要具体查看是哪些函数未找到。常见的函数如`hdfs_java_setup`、`hadoop_validate_env`等,这些函数通常在`hdfs-config.sh`或其他配置脚本中定义。 另外,注意Hadoop版本差异,不同版本的脚本可能略有不同。 下面,我们按照上述思路给出详细解答。</think>### Hadoop启动HDFS服务错误解决方案 当执行`start-dfs.sh`脚本出现 **"function not found"** 和 **"Cannot execute /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh"** 错误时,通常由以下原因引起: #### 1. 脚本权限问题 (最常见原因) ```bash # 检查脚本权限 (应显示 -rwxr-xr-x) ls -l /opt/hadoop/hadoop/libexec/hdfs-config.sh # 修复权限 (755表示所有者可读写执行,其他用户可读执行) sudo chmod 755 /opt/hadoop/hadoop/libexec/*.sh sudo chmod 755 /opt/hadoop/hadoop/sbin/*.sh ``` #### 2. 文件编码格式错误 (Windows/Linux换行符不兼容) ```bash # 检查文件格式 (应显示 "ASCII text",而非"with CRLF") file /opt/hadoop/hadoop/libexec/hdfs-config.sh # 安装转换工具并修复 sudo apt-get install dos2unix # Debian/Ubuntu sudo yum install dos2unix # CentOS/RHEL sudo dos2unix /opt/hadoop/hadoop/libexec/*.sh ``` #### 3. 环境变量配置错误 检查`hadoop-env.sh`中的关键配置: ```bash # 确保JAVA_HOME正确指向JDK安装路径 export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 # 检查Hadoop配置路径 export HADOOP_CONF_DIR=/opt/hadoop/hadoop/etc/hadoop ``` #### 4. 函数未定义问题 在`hdfs-config.sh`开头添加调试信息: ```bash #!/usr/bin/env bash set -x # 启用详细执行追踪 echo "Current PATH: $PATH" source "${HADOOP_CONF_DIR}/hadoop-env.sh" # 确保加载环境 ``` #### 5. 路径链接问题 检查符号链接是否有效: ```bash # 验证实际路径 ls -l /opt/hadoop/hadoop/bin/../libexec/hdfs-config.sh # 重建符号链接 (示例) cd /opt/hadoop/hadoop/bin ln -sfn ../libexec/hdfs-config.sh ``` #### 验证修复步骤 1. 修复权限和文件格式 2. 重新加载环境变量: ```bash source /opt/hadoop/hadoop/etc/hadoop/hadoop-env.sh ``` 3. 手动测试脚本: ```bash /opt/hadoop/hadoop/libexec/hdfs-config.sh ``` 4. 再次启动HDFS: ```bash start-dfs.sh ``` > **注意**:若使用非root用户操作,确保该用户对`/opt/hadoop`目录有所有权: > ```bash > sudo chown -R hadoopuser:hadoopgroup /opt/hadoop > ``` [^1]: Hadoop官方文档 - 启动故障排除指南
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值