安装ambari的时候遇到的ambari和hadoop问题集

5.在安装的时候遇到的问题

5.1使用ambari-server start的时候出现ERROR: Exiting with exit code -1.

5.1.1REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information

 

解决:

由于是重新安装,所以在使用/etc/init.d/postgresql  initdb初始化数据库的时候会出现这个错误,所以需要

先用yum –y remove postgresql*命令把postgresql卸载

然后把/var/lib/pgsql/data目录下的文件全部删除

然后再配置postgresql数据库(执行1.6章节内容)

然后再次安装(3章节内容)

 

5.1.2在日志中有如下错误:ERROR [main] AmbariServer:820 - Failed to run the Ambari Server

 

com.google.inject.ProvisionException: Guice provision errors:

 

1) Error injecting method, java.lang.NullPointerException

  at org.apache.ambari.server.api.services.AmbariMetaInfo.init(AmbariMetaInfo.java:243)

  at org.apache.ambari.server.api.services.AmbariMetaInfo.class(AmbariMetaInfo.java:125)

  while locating org.apache.ambari.server.api.services.AmbariMetaInfo

    for field at org.apache.ambari.server.controller.AmbariServer.ambariMetaInfo(AmbariServer.java:145)

  at org.apache.ambari.server.controller.AmbariServer.class(AmbariServer.java:145)

  while locating org.apache.ambari.server.controller.AmbariServer

 

1 error

        at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)

        at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)

        at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:813)

Caused by: java.lang.NullPointerException

        at org.apache.ambari.server.stack.StackModule.processRepositories(StackModule.java:665)

        at org.apache.ambari.server.stack.StackModule.resolve(StackModule.java:158)

        at org.apache.ambari.server.stack.StackManager.fullyResolveStacks(StackManager.java:201)

        at org.apache.ambari.server.stack.StackManager.(StackManager.java:119)

        at org.apache.ambari.server.stack.StackManager$$FastClassByGuice$$33e4ffe0.newInstance()

        at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)

        at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)

        at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)

        at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)

        at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)

        at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)

        at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)

        at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)

        at com.sun.proxy.$Proxy26.create(Unknown Source)

        at org.apache.ambari.server.api.services.AmbariMetaInfo.init(AmbariMetaInfo.java:247)

5.2安装HDFS和HBASE的时候出现/usr/hdp/current/hadoop-client/conf  doesn't exist

5.2.1/etc/Hadoop/conf文件链接存在

是由于/etc/hadoop/conf和/usr/hdp/current/hadoop-client/conf目录互相链接,造成死循环,所以要改变一个的链接

cd /etc/hadoop

rm -rf conf

ln -s /etc/hadoop/conf.backup /etc/hadoop/conf

 

HBASE也会遇到同样的问题,解决方式同上

cd /etc/hbase

rm -rf conf

ln -s /etc/hbase/conf.backup /etc/hbase/conf

 

ZooKeeper也会遇到同样的问题,解决方式同上

cd /etc/zookeeper

rm -rf conf

ln -s /etc/zookeeper/conf.backup /etc/zookeeper/conf

 

5.2.2/etc/Hadoop/conf文件链接不存在

查看正确的配置,发现缺少两个目录文件config.backup和2.4.0.0-169,把文件夹拷贝到/etc/hadoop目录下

 

重新创建/etc/hadoop目录下的conf链接:

cd /etc/hadoop

rm -rf conf

ln -s /usr/hdp/current/hadoop-client/conf conf

 

问题解决

 

5.3在认证机器(Confirm Hosts)的时候出现错误Ambari agent machine hostname (localhost) does not match expected ambari server hostname

Ambari配置时在Confirm Hosts的步骤时,中间遇到一个很奇怪的问题:总是报错误:

Ambari agent machine hostname (localhost.localdomain) does not match expected ambari server hostname (xxx).

后来修改的/etc/hosts文件中

 

修改前:

127.0.0.1   localhost dsj-kj1
::1         localhost dsj-kj1

10.13.39.32     dsj-kj1

10.13.39.33     dsj-kj2

10.13.39.34     dsj-kj3

10.13.39.35     dsj-kj4

10.13.39.36     dsj-kj5

修改后:

127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4
::1          localhost localhost.localdomain localhost6 localhost6.localdomain6


10.13.39.32     dsj-kj1

10.13.39.33     dsj-kj2

10.13.39.34     dsj-kj3

10.13.39.35     dsj-kj4

10.13.39.36     dsj-kj5

感觉应该是走的ipv6协议,很奇怪,不过修改后就可以了。

5.4ambary-server重装

删除使用脚本删除

注意删除后要安装两个系统组件

yum -y install ruby*

yum -y install redhat-lsb*

yum -y install snappy*

 

安装参考3

 

5.5Ambari连接mysql设置

在主节点把mysql数据库连接包拷贝在/var/lib/ambary-server/resources目录下并改名为mysql-jdbc-driver.jar

cp /usr/share/java/mysql-connector-java-5.1.17.jar /var/lib/ambari-server/resources/mysql-jdbc-driver.jar

 

再在图形界面下启动hive

5.6在注册机器(Confirm Hosts)的时候出现错误Failed to start ping port listener of: [Errno 98] Address already in use

 

某个端口或者进程一直陪占用

解决方法:
发现df命令一直执行没有完成,

[root@testserver1 ~]# netstat -lanp|grep 8670
tcp        0      0 0.0.0.0:8670                0.0.0.0:*                   LISTEN      2587/df

[root@testserver1 ~]# kill -9 2587
kill后,再重启ambari-agent问题解决

[root@testserver1 ~]# service ambari-agent restart
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
ambari-agent is not running. No PID found at /var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log

5.7在注册机器(Confirm Hosts)的时候出现错误The following hosts have Transparent HugePages (THP) enabled。THP should be disabled to avoid potential Hadoop performance issues


解决方法:
在Linux下执行:

echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag

echo never >/sys/kernel/mm/redhat_transparent_hugepage/enabled

echo never >/sys/kernel/mm/transparent_hugepage/enabled

echo never >/sys/kernel/mm/transparent_hugepage/defrag

 

5.8启动hive的时候出现错误unicodedecodeerror ambari in position 117

 

查看/etc/sysconfig/i18n文件,发现内容如下:

LANG=”zh_CN.UTF8”

原来系统字符集设置成了中文,改成如下内容,问题解决:

LANG="en_US.UTF-8"

 

 

5.9安装Metrics的时候报如下错误,安装包找不到

1.failure: Updates-ambari-2.2.1.0/ambari/ambari-metrics-monitor-2.2.1.0-161.x86_64.rpm from HDP-UTILS-1.1.0.20: [Errno 256] No more mirrors to try.

 

在ftp源服务器上执行命令:

cd /var/www/html/ambari/HDP-UTILS-1.1.0.20/repos/centos6

mkdir Updates-ambari-2.2.1.0

cp -r /var/www/html/ambari/Updates-ambari-2.2.1.0/ambari /var/www/html/ambari/HDP-UTILS-1.1.0.20/repos/centos6/Updates-ambari-2.2.1.0

 

然后重新生成repodata

cd /var/www/html/ambari

rm -rf repodata

createrepo ./

 

2.failure: HDP-UTILS-1.1.0.20/repos/centos6/Updates-ambari-2.2.1.0/ambari/ambari-metrics-monitor-2.2.1.0-161.x86_64.rpm from HDP-UTILS-1.1.0.20: [Errno 256] No more mirrors to try.

 

在/etc/yum.repos.d目录下删除mnt.repo,并使用yum clean all命令来清空yum的缓存

cd /ec/yum.repos.d

rm -rf mnt.repo

yum clean all


5.11jps 报process information unavailable解决办法

4791 -- process information unavailable

 

解决办法:

 

进入tmp目录,

 

cd /tmp

 

删除该目录下

 

名称为hsperfdata_{ username}的文件夹

 

然后jps,清净了。

 

脚本:

cd /tmp

ls -l | grep hsperf | xargs rm -rf

ls -l | grep hsperf

 

5.12namenode启动报错在日志文件中ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start namenode

日志中还有java.net.BindException: Port in use: gmaster:50070

Caused by: java.net.BindException: Address already in use

判断原因是50070上一次没有释放,端口占用

 

netstat下time_wait状态的tcp连接: 
1.这是一种处于连接完全关闭状态前的状态; 
2.通常要等上4分钟(windows server)的时间才能完全关闭; 
3.这种状态下的tcp连接占用句柄与端口等资源,服务器也要为维护这些连接状态消耗资源; 
4.解决这种time_wait的tcp连接只有让服务器能够快速回收和重用那些TIME_WAIT的资源:修改注册表[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]添加dword值TcpTimedWaitDelay=30(30也为微软建议值;默认为2分钟)和MaxUserPort:65534(可选值5000 - 65534); 
5.具体tcpip连接参数配置还可参照这里:http://technet.microsoft.com/zh-tw/library/cc776295%28v=ws.10%29.aspx 
6.linux下: 
vi /etc/sysctl.conf 
新增如下内容: 
net.ipv4.tcp_tw_reuse = 1 
net.ipv4.tcp_tw_recycle = 1 
net.ipv4.tcp_syncookies=1 

net.ipv4.tcp_fin_timeout=30

net.ipv4.tcp_keepalive_time=1800

net.ipv4.tcp_max_syn_backlog=8192


使内核参数生效: 
[root@web02 ~]# sysctl -p 
readme: 
net.ipv4.tcp_syncookies=1 打开TIME-WAIT套接字重用功能,对于存在大量连接的Web服务器非常有效。 
net.ipv4.tcp_tw_recyle=1 
net.ipv4.tcp_tw_reuse=1 减少处于FIN-WAIT-2连接状态的时间,使系统可以处理更多的连接。 
net.ipv4.tcp_fin_timeout=30 减少TCP KeepAlive连接侦测的时间,使系统可以处理更多的连接。 
net.ipv4.tcp_keepalive_time=1800 增加TCP SYN队列长度,使系统可以处理更多的并发连接。 
net.ipv4.tcp_max_syn_backlog=8192


5.13在启动的时候报错误resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh  -H -E /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh

在日志中有如下内容:

2016-03-31 13:55:28,090 INFO  security.ShellBasedIdMapping (ShellBasedIdMapping.java:updateStaticMapping(322)) - Not doing static UID/GID mapping because '/etc/nfs.map' does not exist.

2016-03-31 13:55:28,096 INFO  nfs3.WriteManager (WriteManager.java:(92)) - Stream timeout is 600000ms.

2016-03-31 13:55:28,096 INFO  nfs3.WriteManager (WriteManager.java:(100)) - Maximum open streams is 256

2016-03-31 13:55:28,096 INFO  nfs3.OpenFileCtxCache (OpenFileCtxCache.java:(54)) - Maximum open streams is 256

2016-03-31 13:55:28,259 INFO  nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:(205)) - Configured HDFS superuser is

2016-03-31 13:55:28,261 INFO  nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:clearDirectory(231)) - Delete current dump directory /tmp/.hdfs-nfs

2016-03-31 13:55:28,269 WARN  fs.FileUtil (FileUtil.java:deleteImpl(187)) - Failed to delete file or dir [/tmp/.hdfs-nfs]: it still exists.

说明hdfs这个用户对/tmp没有权限

赋予权限给hdfs用户:

chown  hdfs:hadoop /tmp

 

再启动问题解决

5.14在安装ranger组件的时候出现错误连接不上mysql数据库rangeradmin用户和不能赋权的问题

在数据库中先删除所有rangeradmin用户,注意使用drop user命令:

drop user 'rangeradmin'@'%';

drop user 'rangeradmin'@'localhost';

drop user 'rangeradmin'@'gmaster';

drop user 'rangeradmin'@'gslave1';

drop user 'rangeradmin'@'gslave2';

FLUSH PRIVILEGES;

 

再创建用户(注意gmaster是ranger安装的服务器机器名)

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值