1、远程拷贝HDP组件不全导致安装client时缺少rpm包,手动拷贝解决
2、安装HAWQ,启动时报错 passwordlell ssh hawq hosts ,hawq master 和其他主机机拷贝文件输入密码受限,两方面原因: 一 root 用户 ssh 无密登录时 权限配置错误,正确的权限应该是 chmod 700 /roo/.ssh chmod 600 /root/.ssh/authorized_keys ;二:su gpadmin 在 /home/gpadmin 下新建hawq_host文件,写入节点hostname 执行 hawq ssh-exkeys -f host_file 检查Log发现RSA hostname 无法访问, 修改/etc/hosts文件,重新修改hostname 成功。
3、中间安装过程失败卸载服务
卸载某个服务
stop:
curl -s -u admin:admin -H “X-Requested-By: Ambari” -X PUT -d ‘{“RequestInfo”:{“context”:”Stop Service”},”Body”:{“ServiceInfo”:{“state”:”INSTALLED”}}}’ http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME
delete
curl -s -u admin:admin -H “X-Requested-By: Ambari” -X DELETE http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME
卸载整个集群(Ambari和hadoop)
执行脚本:
#!/bin/bash
ambari-server stop
ambari-server reset
ambari-agent stop
service mysqld stop
service postgresql stop
python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py
yum remove ambari\* hadoop hdfs bigtop-jsvc bigtop-tomcat hbase\* hadoop\* hdp-select ranger\* zookeeper\* postgresql-libs postgresql postgresql-server
yum remove mysql mysql-server mysql-libs mysql-connector-java
rm -rf /opt/hadoop
rm -rf /opt/app/hadoop
rm -rf /opt/app/ambari-metrics-collector
rm -rf /opt/kafka-logs
rm -rf /usr/hdp
rm -rf /usr/hadoop
rm -rf /usr/kafka-logs
rm -rf /usr/lib/ambari*
rm -rf /usr/lib/hadoop
rm -rf /usr/lib/nagios
rm -rf /usr/lib/ams-hbase
rm -rf /var/nagios
rm -rf /var/kafka-logs
rm -rf /var/lib/ambari*
rm -rf /var/lib/flume
rm -rf /var/lib/ganglia*
rm -rf /var/lib/hadoop*
rm -rf /var/lib/hdfs
rm -rf /var/lib/hive
rm -rf /var/lib/atlas
rm -rf /var/lib/mysql
rm -rf /var/lib/pgsql
rm -rf /var/run/hadoop /var/run/hbase /var/run/zookeeper /var/run/flume /var/run/webhcat /var/run/hadoop-yarn /var/run/hadoop-mapreduce
rm -rf /var/run/accumulo
rm -rf /var/run/ambari*
rm -rf /var/run/atlas
rm -rf /var/run/nagios
rm -rf /var/run/spark
rm -rf /var/log/hbase /var/log/hive /var/log/zookeeper /var/log/flume /var/log/hadoop-yarn /var/log/hadoop-mapreduce
rm -rf /var/log/accumulo
rm -rf /var/log/ambari*
rm -rf /var/log/atlas
rm -rf /var/log/nagios
rm -rf /var/log/spark
rm -rf /var/log/hadoop
rm -rf /tmp/ambari-qa
rm -rf /etc/ambari*
rm -rf /etc/ams-hbase
rm -rf /etc/flume
rm -rf /etc/ganglia
rm -rf /etc/hadoop*
rm -rf /etc/hbase
rm -rf /etc/hive*
rm -rf /etc/nagios
rm -rf /etc/phoenix
rm -rf /etc/pig
rm -rf /etc/tez
rm -rf /etc/zookeeper
rm -rf /etc/accumulo
rm -rf /etc/atlas
rm -rf /etc/spark
rm -rf /etc/mahout
rm -rf /home/accumulo /home/ams /home/atlas /home/mahout /home/nagios /home/spark
rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP-2.3.0.0.repo /etc/yum.repos.d/HDP-UTILS.repo /etc/yum.repos.d/HDP.repo
yum clean all
ps -elf | grep java
另外补充: userdel 部分
4、卸载所以服务之后 yum 不能用,发现是卸载python的组件导致
执行 whereis python 修改 vi /usr/bin/yum 中python的目录
5、安装metrict-monitor client的过程中报错, require python-2.6.6-64 while installed python-2.6.6-66
在已经挂载镜像iso的Packages中拷贝出对应的python python-devel python-lib 下载python-2.6.6-66 rpm -e --nodeps python 后重新安装python2.6.6-64 报错解决
6、ams服务无法停止 ,进程无法Kill,userdel 无法删除, 重启机器后即可。
7、datanode 和zookeeper启动后一会自动挂掉,查Log发现 报错 Address already in use 查看对应组件的Log /var/log/.....查看对应的端口,通过 netstat -anp | grep port_name kill 掉对应的进程,重新启动服务成功。
8、hawq master无法启动 执行 sysctl -p 后正常启动