Ambari离线部署Hadoop集群踩到的坑

1、远程拷贝HDP组件不全导致安装client时缺少rpm包,手动拷贝解决

2、安装HAWQ,启动时报错 passwordlell ssh hawq hosts ,hawq master 和其他主机机拷贝文件输入密码受限,两方面原因: 一 root 用户 ssh 无密登录时 权限配置错误,正确的权限应该是 chmod 700 /roo/.ssh chmod 600 /root/.ssh/authorized_keys ;二:su gpadmin 在 /home/gpadmin 下新建hawq_host文件,写入节点hostname 执行 hawq ssh-exkeys -f host_file 检查Log发现RSA hostname 无法访问, 修改/etc/hosts文件,重新修改hostname 成功。

3、中间安装过程失败卸载服务 

   卸载某个服务

stop:

curl -s -u admin:admin -H “X-Requested-By: Ambari” -X PUT -d ‘{“RequestInfo”:{“context”:”Stop Service”},”Body”:{“ServiceInfo”:{“state”:”INSTALLED”}}}’ http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME

delete

curl -s -u admin:admin -H “X-Requested-By: Ambari” -X DELETE http://AMBARI-HOST:8080/api/v1/clusters/CLUSTER_NAME/services/SERVICE_NAME

卸载整个集群(Ambari和hadoop)

执行脚本:


#!/bin/bash


ambari-server stop
ambari-server reset
ambari-agent stop
service mysqld stop
service postgresql stop


python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py


yum remove ambari\* hadoop hdfs bigtop-jsvc bigtop-tomcat hbase\* hadoop\* hdp-select ranger\* zookeeper\* postgresql-libs postgresql postgresql-server
yum remove mysql mysql-server mysql-libs mysql-connector-java


rm -rf /opt/hadoop
rm -rf /opt/app/hadoop
rm -rf /opt/app/ambari-metrics-collector
rm -rf /opt/kafka-logs


rm -rf /usr/hdp
rm -rf /usr/hadoop
rm -rf /usr/kafka-logs


rm -rf /usr/lib/ambari*
rm -rf /usr/lib/hadoop
rm -rf /usr/lib/nagios
rm -rf /usr/lib/ams-hbase


rm -rf /var/nagios
rm -rf /var/kafka-logs


rm -rf /var/lib/ambari*
rm -rf /var/lib/flume
rm -rf /var/lib/ganglia*
rm -rf /var/lib/hadoop*
rm -rf /var/lib/hdfs
rm -rf /var/lib/hive
rm -rf /var/lib/atlas
rm -rf /var/lib/mysql
rm -rf /var/lib/pgsql




rm -rf /var/run/hadoop /var/run/hbase /var/run/zookeeper /var/run/flume /var/run/webhcat /var/run/hadoop-yarn /var/run/hadoop-mapreduce
rm -rf /var/run/accumulo
rm -rf /var/run/ambari*
rm -rf /var/run/atlas
rm -rf /var/run/nagios
rm -rf /var/run/spark


rm -rf /var/log/hbase /var/log/hive /var/log/zookeeper /var/log/flume /var/log/hadoop-yarn /var/log/hadoop-mapreduce
rm -rf /var/log/accumulo
rm -rf /var/log/ambari*
rm -rf /var/log/atlas
rm -rf /var/log/nagios
rm -rf /var/log/spark
rm -rf /var/log/hadoop


rm -rf /tmp/ambari-qa


rm -rf /etc/ambari*
rm -rf /etc/ams-hbase
rm -rf /etc/flume
rm -rf /etc/ganglia
rm -rf /etc/hadoop*
rm -rf /etc/hbase
rm -rf /etc/hive*
rm -rf /etc/nagios
rm -rf /etc/phoenix
rm -rf /etc/pig
rm -rf /etc/tez
rm -rf /etc/zookeeper
rm -rf /etc/accumulo
rm -rf /etc/atlas
rm -rf /etc/spark
rm -rf /etc/mahout


rm -rf /home/accumulo /home/ams /home/atlas /home/mahout /home/nagios /home/spark


rm -rf /etc/yum.repos.d/ambari.repo /etc/yum.repos.d/HDP-2.3.0.0.repo /etc/yum.repos.d/HDP-UTILS.repo /etc/yum.repos.d/HDP.repo

yum clean all


ps -elf | grep java


另外补充: userdel 部分

4、卸载所以服务之后 yum 不能用,发现是卸载python的组件导致

执行 whereis python  修改 vi /usr/bin/yum 中python的目录

5、安装metrict-monitor client的过程中报错, require python-2.6.6-64 while installed python-2.6.6-66

   在已经挂载镜像iso的Packages中拷贝出对应的python python-devel python-lib 下载python-2.6.6-66 rpm -e --nodeps python 后重新安装python2.6.6-64 报错解决

6、ams服务无法停止 ,进程无法Kill,userdel 无法删除, 重启机器后即可。

7、datanode 和zookeeper启动后一会自动挂掉,查Log发现 报错 Address already in use 查看对应组件的Log  /var/log/.....查看对应的端口,通过 netstat -anp | grep port_name kill 掉对应的进程,重新启动服务成功。

8、hawq master无法启动 执行 sysctl -p 后正常启动




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值