安装cdh过程中遇到hdfs datanode权限问题,与cdh官方论坛上下述问题相似: 故记录。
问题原因:
为cdh5.x版本 bug,在5.13及之后解决。
临时解决方法:
chmod 777 对应文件路径
官方回答记录:
Can't open /run/cloudera-scm-agent/process/468-hba... - Cloudera Community - 69296
Can't open /run/cloudera-scm-agent/process/468-hbase-REGIONSERVER/config.zip: Permission denied
I use cdh5.12.2. I cannot start the region server. I got,
Can't open /run/cloudera-scm-agent/process/468-hbase-REGIONSERVER/config.zip: Permission denied.
The ownership of this zip file belongs to root.
-rw------- 1 root root 7309 Jun 21 12:12 /run/cloudera-scm-agent/process/468-hbase-REGIONSERVER/config.zip
From command line, I changed the ownership to hbase.
-rwxr-xr-x 1 hbase hbase 7309 Jun 21 12:06 /run/cloudera-scm-agent/process/468-hbase-REGIONSERVER/config.zip
From Cloudera Manager, I start the region server. I got the Permission denied again. This time, it complains another folder, 471-hbase-REGIONSERVER. Earlier, it was 468-hbase-REGIONSERVER.
Can't open /run/cloudera-scm-agent/process/471-hbase-REGIONSERVER/config.zip: Permission denied
-rw------- 1 root root 7309 Jun 21 12:22 /run/cloudera-scm-agent/process/471-hbase-REGIONSERVER/config.zip
How to solve this problem?
官方回答:
The "config.zip" error can be ignored. It is expected due to a cosmetic bug that is fixed in Cloudera Manager 5.13 and on.
First thing we need to know is how you know the region server did not start. What are you seeing when you try to start HBase?
Also, is it just one Region Server or are other HBase roles also not starting?
The best place to start troubleshooting a Cloudera Manager initiated start of a role is to review:
- the agent server logs on the host where the region server failed to start
- the stderr.log and stdout.log files for the process will give clues about any issues the supervisor is having starting the process.
Here is the general process of how a service starts:
- You click start in CM
- CM tells the agent to heartbeat
- the agent sends a heartbeat to CM
- CM replies with a heartbeat response
- Agent compares what it has running with what CM says should be running (and decides what to do to match what CM says)
- Agent retrieves the files necessary to start the process from CM and lays down the files
- Agent signals the supervisor process
- Supervisor checks to see if processes need to stop/start
- If starting, the supervisor will execute CM shell scripts to start the process
- Once the shell is complete, the process runs as a child process of the supervisor.
Hopefully that helps clarify the process so you can start troubleshooting.
The process's stdout.log file (in the process directory's logs directory) is a good place to start.
You can view them in Cloudera Manager by going to the role's status page and clicking the "Log Files" drop-down.
Can't open /run/cloudera-scm-agent/process/.../con... - Cloudera Community - 57616
Can't open /run/cloudera-scm-agent/process/.../config.zip: Permission denied.
I found that many commands will fail with the following permission denied error with Cloudera Manager 5.12
++ printf '! -name %s ' cloudera-config.sh httpfs.sh hue.sh impala.sh sqoop.sh supervisor.conf '*.log' hdfs.keytab '*jceks'
+ find /run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format -type f '!' -path '/run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format/logs/*' '!' -name cloudera-config.sh '!' -name httpfs.sh '!' -name hue.sh '!' -name impala.sh '!' -name sqoop.sh '!' -name supervisor.conf '!' -name '*.log' '!' -name hdfs.keytab '!' -name '*jceks' -exec perl -pi -e 's#{{CMF_CONF_DIR}}#/run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format#g' '{}' ';'
Can't open /run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format/config.zip: Permission denied.
Can't open /run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format/proc.json: Permission denied.+ make_scripts_executable
+ find /run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format -regex '.*\.\(py\|sh\)$' -exec chmod u+x '{}' ';'
+ '[' DATANODE_MAX_LOCKED_MEMORY '!=' '' ']'
+ ulimit -l
+ export HADOOP_IDENT_STRING=hdfs
+ HADOOP_IDENT_STRING=hdfs
+ '[' -n '' ']'
+ '[' mkdir '!=' format-namenode ']'
+ acquire_kerberos_tgt hdfs.keytab
+ '[' -z hdfs.keytab ']'
+ '[' -n '' ']'
+ '[' validate-writable-empty-dirs = format-namenode ']'
+ '[' file-operation = format-namenode ']'
+ '[' bootstrap = format-namenode ']'
+ '[' failover = format-namenode ']'
+ '[' transition-to-active = format-namenode ']'
+ '[' initializeSharedEdits = format-namenode ']'
+ '[' initialize-znode = format-namenode ']'
+ '[' format-namenode = format-namenode ']'
+ '[' -z /dfs/nn ']'
+ for dfsdir in '$DFS_STORAGE_DIRS'
+ '[' -e /dfs/nn ']'
+ '[' '!' -d /dfs/nn ']'
+ CLUSTER_ARGS=
+ '[' 2 -eq 2 ']'
+ CLUSTER_ARGS='-clusterId cluster8'
+ '[' 3 = 5 ']'
+ '[' -3 = 5 ']'
+ exec /usr/lib/hadoop-hdfs/bin/hdfs --config /run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format namenode -format -clusterId cluster8 -nonInteractive
/usr/lib64/cmf/service/hdfs/hdfs.sh: line 273: /usr/lib/hadoop-hdfs/bin/hdfs: No such file or directory
Seems this is a problem caused by a bug fix since 5.9
Did anyone encounter this in recent versions?
[root@c53 ~]# ls -l /var/run/cloudera-scm-agent/process/77-hdfs-NAMENODE-format/
total 88
-rwxr----- 1 hdfs hdfs 2149 Jul 19 09:40 cloudera_manager_agent_fencer.py
-rw-r----- 1 hdfs hdfs 30 Jul 19 09:40 cloudera_manager_agent_fencer_secret_key.txt
-rw-r----- 1 hdfs hdfs 353 Jul 19 09:40 cloudera-monitor.properties
-rw-r----- 1 hdfs hdfs 316 Jul 19 09:40 cloudera-stack-monitor.properties
-rw------- 1 root root 8868 Jul 19 09:40 config.zip
-rw-r----- 1 hdfs hdfs 3460 Jul 19 09:40 core-site.xml
-rw-r----- 1 hdfs hdfs 12 Jul 19 09:40 dfs_hosts_allow.txt
-rw-r----- 1 hdfs hdfs 0 Jul 19 09:40 dfs_hosts_exclude.txt
-rw-r----- 1 hdfs hdfs 1388 Jul 19 09:40 event-filter-rules.json
-rw-r--r-- 1 hdfs hdfs 4 Jul 19 09:40 exit_code
-rw-r----- 1 hdfs hdfs 0 Jul 19 09:40 hadoop-metrics2.properties
-rw-r----- 1 hdfs hdfs 98 Jul 19 09:40 hadoop-policy.xml
-rw------- 1 hdfs hdfs 0 Jul 19 09:40 hdfs.keytab
-rw-r----- 1 hdfs hdfs 4872 Jul 19 09:40 hdfs-site.xml
-rw-r----- 1 hdfs hdfs 0 Jul 19 09:40 http-auth-signature-secret
-rw-r----- 1 hdfs hdfs 2246 Jul 19 09:40 log4j.properties
drwxr-x--x 2 hdfs hdfs 80 Jul 19 09:40 logs
-rw-r----- 1 hdfs hdfs 2470 Jul 19 09:40 navigator.client.properties
-rw------- 1 root root 3879 Jul 19 09:40 proc.json
-rw-r----- 1 hdfs hdfs 315 Jul 19 09:40 ssl-client.xml
-rw-r----- 1 hdfs hdfs 98 Jul 19 09:40 ssl-server.xml
-rw------- 1 root root 3463 Jul 19 09:40 supervisor.conf
-rw-r----- 1 hdfs hdfs 187 Jul 19 09:40 topology.map
-rwxr----- 1 hdfs hdfs 1549 Jul 19 09:40 topology.py
I resolved this issue by configuring proper java heap memory for name node&Configuring HA on namenode
Shafi
I was using 5.12 express edition. It was not easy to startup with services after installation. The configurations needs to be changed and High Availability configurations, Journal nodes.
I have updated Java heap size for HDFS instances. Namenode, Secondary Name node and Journal nodes only retained
regards
Shafi
I had the same problem until just now, on 5.12.x, and in my case it was caused by a problem with ssl, so if you use ssl, please read the following:
"this might be caused by having more than one oracle java installed, and even worse, any version of openjdk java. Namely, make sure you have added the ca you used for the ssl to the java keystore of the java version you are using (you can find that out in process list). Also, make sure that keytool you are using is belonging to this version of java - so it's best to have only one version installed, or (if that is unavoidable), use the full path to keytool. Hope it helps."
yes, it's Java problem, I export JAVA_HOME again to solve it
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

被折叠的 条评论
为什么被折叠?



