5.在安装的时候遇到的问题
5.1使用ambari-server start的时候出现ERROR: Exiting with exit code -1.
5.1.1REASON: Ambari Server java process died with exitcode 255. Check /var/log/ambari-server/ambari-server.out for more information
解决:
由于是重新安装,所以在使用/etc/init.d/postgresql initdb初始化数据库的时候会出现这个错误,所以需要
先用yum –y remove postgresql*命令把postgresql卸载
然后把/var/lib/pgsql/data目录下的文件全部删除
然后再配置postgresql数据库(执行1.6章节内容)
然后再次安装(3章节内容)
5.1.2在日志中有如下错误:ERROR [main] AmbariServer:820 - Failed to run the Ambari Server
com.google.inject.ProvisionException: Guice provision errors:
1) Error injecting method, java.lang.NullPointerException
at org.apache.ambari.server.api.services.AmbariMetaInfo.init(AmbariMetaInfo.java:243)
at org.apache.ambari.server.api.services.AmbariMetaInfo.class(AmbariMetaInfo.java:125)
while locating org.apache.ambari.server.api.services.AmbariMetaInfo
for field at org.apache.ambari.server.controller.AmbariServer.ambariMetaInfo(AmbariServer.java:145)
at org.apache.ambari.server.controller.AmbariServer.class(AmbariServer.java:145)
while locating org.apache.ambari.server.controller.AmbariServer
1 error
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1013)
at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:813)
Caused by: java.lang.NullPointerException
at org.apache.ambari.server.stack.StackModule.processRepositories(StackModule.java:665)
at org.apache.ambari.server.stack.StackModule.resolve(StackModule.java:158)
at org.apache.ambari.server.stack.StackManager.fullyResolveStacks(StackManager.java:201)
at org.apache.ambari.server.stack.StackManager.(StackManager.java:119)
at org.apache.ambari.server.stack.StackManager$$FastClassByGuice$$33e4ffe0.newInstance()
at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
at com.sun.proxy.$Proxy26.create(Unknown Source)
at org.apache.ambari.server.api.services.AmbariMetaInfo.init(AmbariMetaInfo.java:247)
5.2安装HDFS和HBASE的时候出现/usr/hdp/current/hadoop-client/conf doesn't exist
5.2.1/etc/Hadoop/conf文件链接存在
是由于/etc/hadoop/conf和/usr/hdp/current/hadoop-client/conf目录互相链接,造成死循环,所以要改变一个的链接
rm -rf conf
ln -s /etc/hadoop/conf.backup /etc/hadoop/conf
HBASE也会遇到同样的问题,解决方式同上
cd /etc/hbase
rm -rf conf
ln -s /etc/hbase/conf.backup /etc/hbase/conf
ZooKeeper也会遇到同样的问题,解决方式同上
cd /etc/zookeeper
rm -rf conf
ln -s /etc/zookeeper/conf.backup /etc/zookeeper/conf
5.2.2/etc/Hadoop/conf文件链接不存在
查看正确的配置,发现缺少两个目录文件config.backup和2.4.0.0-169,把文件夹拷贝到/etc/hadoop目录下
重新创建/etc/hadoop目录下的conf链接:
rm -rf conf
ln -s /usr/hdp/current/hadoop-client/conf conf
问题解决
5.3在认证机器(Confirm Hosts)的时候出现错误Ambari agent machine hostname (localhost) does not match expected ambari server hostname
Ambari配置时在Confirm Hosts的步骤时,中间遇到一个很奇怪的问题:总是报错误:
Ambari agent machine hostname (localhost.localdomain) does not match expected ambari server hostname (xxx).
后来修改的/etc/hosts文件中
修改前:
127.0.0.1 localhost dsj-kj1
::1 localhost dsj-kj1
10.13.39.32 dsj-kj1
10.13.39.33 dsj-kj2
10.13.39.34 dsj-kj3
10.13.39.35 dsj-kj4
10.13.39.36 dsj-kj5
修改后:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.13.39.32 dsj-kj1
10.13.39.33 dsj-kj2
10.13.39.34 dsj-kj3
10.13.39.35 dsj-kj4
10.13.39.36 dsj-kj5
感觉应该是走的ipv6协议,很奇怪,不过修改后就可以了。
5.4ambary-server重装
删除使用脚本删除
注意删除后要安装两个系统组件
yum -y install ruby*
yum -y install redhat-lsb*
yum -y install snappy*
安装参考3
5.5Ambari连接mysql设置
在主节点把mysql数据库连接包拷贝在/var/lib/ambary-server/resources目录下并改名为mysql-jdbc-driver.jar
cp /usr/share/java/mysql-connector-java-5.1.17.jar /var/lib/ambari-server/resources/mysql-jdbc-driver.jar
再在图形界面下启动hive
5.6在注册机器(Confirm Hosts)的时候出现错误Failed to start ping port listener of: [Errno 98] Address already in use
某个端口或者进程一直陪占用
解决方法:
发现df命令一直执行没有完成,
[root@testserver1 ~]# netstat -lanp|grep 8670
tcp 0 0 0.0.0.0:8670 0.0.0.0:* LISTEN 2587/df
[root@testserver1 ~]# kill -9 2587
kill后,再重启ambari-agent问题解决
[root@testserver1 ~]# service ambari-agent restart
Verifying Python version compatibility...
Using python /usr/bin/python2.6
ambari-agent is not running. No PID found at /var/run/ambari-agent/ambari-agent.pid
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Checking for previously running Ambari Agent...
Starting ambari-agent
Verifying ambari-agent process status...
Ambari Agent successfully started
Agent PID at: /var/run/ambari-agent/ambari-agent.pid
Agent out at: /var/log/ambari-agent/ambari-agent.out
Agent log at: /var/log/ambari-agent/ambari-agent.log
5.7在注册机器(Confirm Hosts)的时候出现错误The following hosts have Transparent HugePages (THP) enabled。THP should be disabled to avoid potential Hadoop performance issues
解决方法:
在Linux下执行:
echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
echo never >/sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/defrag
5.8启动hive的时候出现错误unicodedecodeerror ambari in position 117
查看/etc/sysconfig/i18n文件,发现内容如下:
LANG=”zh_CN.UTF8”
原来系统字符集设置成了中文,改成如下内容,问题解决:
LANG="en_US.UTF-8"
5.9安装Metrics的时候报如下错误,安装包找不到
1.failure: Updates-ambari-2.2.1.0/ambari/ambari-metrics-monitor-2.2.1.0-161.x86_64.rpm from HDP-UTILS-1.1.0.20: [Errno 256] No more mirrors to try.
在ftp源服务器上执行命令:
cd /var/www/html/ambari/HDP-UTILS-1.1.0.20/repos/centos6
mkdir Updates-ambari-2.2.1.0
cp -r /var/www/html/ambari/Updates-ambari-2.2.1.0/ambari /var/www/html/ambari/HDP-UTILS-1.1.0.20/repos/centos6/Updates-ambari-2.2.1.0
然后重新生成repodata
rm -rf repodata
createrepo ./
2.failure: HDP-UTILS-1.1.0.20/repos/centos6/Updates-ambari-2.2.1.0/ambari/ambari-metrics-monitor-2.2.1.0-161.x86_64.rpm from HDP-UTILS-1.1.0.20: [Errno 256] No more mirrors to try.
在/etc/yum.repos.d目录下删除mnt.repo,并使用yum clean all命令来清空yum的缓存
cd /ec/yum.repos.d
rm -rf mnt.repo
yum clean all
5.11jps 报process information unavailable解决办法
4791 -- process information unavailable
解决办法:
进入tmp目录,
cd /tmp
删除该目录下
名称为hsperfdata_{ username}的文件夹
然后jps,清净了。
脚本:
ls -l | grep hsperf | xargs rm -rf
ls -l | grep hsperf
5.12namenode启动报错在日志文件中ERROR namenode.NameNode (NameNode.java:main(1712)) - Failed to start namenode
日志中还有java.net.BindException: Port in use: gmaster:50070
Caused by: java.net.BindException: Address already in use
判断原因是50070上一次没有释放,端口占用
netstat下time_wait状态的tcp连接:
1.这是一种处于连接完全关闭状态前的状态;
2.通常要等上4分钟(windows server)的时间才能完全关闭;
3.这种状态下的tcp连接占用句柄与端口等资源,服务器也要为维护这些连接状态消耗资源;
4.解决这种time_wait的tcp连接只有让服务器能够快速回收和重用那些TIME_WAIT的资源:修改注册表[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]添加dword值TcpTimedWaitDelay=30(30也为微软建议值;默认为2分钟)和MaxUserPort:65534(可选值5000 - 65534);
5.具体tcpip连接参数配置还可参照这里:http://technet.microsoft.com/zh-tw/library/cc776295%28v=ws.10%29.aspx
6.linux下:
vi /etc/sysctl.conf
新增如下内容:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.ipv4.tcp_max_syn_backlog=8192
使内核参数生效:
[root@web02 ~]# sysctl -p
readme:
net.ipv4.tcp_syncookies=1 打开TIME-WAIT套接字重用功能,对于存在大量连接的Web服务器非常有效。
net.ipv4.tcp_tw_recyle=1
net.ipv4.tcp_tw_reuse=1 减少处于FIN-WAIT-2连接状态的时间,使系统可以处理更多的连接。
net.ipv4.tcp_fin_timeout=30 减少TCP KeepAlive连接侦测的时间,使系统可以处理更多的连接。
net.ipv4.tcp_keepalive_time=1800 增加TCP SYN队列长度,使系统可以处理更多的并发连接。
net.ipv4.tcp_max_syn_backlog=8192
5.13在启动的时候报错误resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh -H -E /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh
在日志中有如下内容:
2016-03-31 13:55:28,090 INFO security.ShellBasedIdMapping (ShellBasedIdMapping.java:updateStaticMapping(322)) - Not doing static UID/GID mapping because '/etc/nfs.map' does not exist.
2016-03-31 13:55:28,096 INFO nfs3.WriteManager (WriteManager.java:(92)) - Stream timeout is 600000ms.
2016-03-31 13:55:28,096 INFO nfs3.WriteManager (WriteManager.java:(100)) - Maximum open streams is 256
2016-03-31 13:55:28,096 INFO nfs3.OpenFileCtxCache (OpenFileCtxCache.java:(54)) - Maximum open streams is 256
2016-03-31 13:55:28,259 INFO nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:(205)) - Configured HDFS superuser is
2016-03-31 13:55:28,261 INFO nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:clearDirectory(231)) - Delete current dump directory /tmp/.hdfs-nfs
2016-03-31 13:55:28,269 WARN fs.FileUtil (FileUtil.java:deleteImpl(187)) - Failed to delete file or dir [/tmp/.hdfs-nfs]: it still exists.
说明hdfs这个用户对/tmp没有权限
赋予权限给hdfs用户:
再启动问题解决
5.14在安装ranger组件的时候出现错误连接不上mysql数据库rangeradmin用户和不能赋权的问题
在数据库中先删除所有rangeradmin用户,注意使用drop user命令:
drop user 'rangeradmin'@'%';
drop user 'rangeradmin'@'localhost';
drop user 'rangeradmin'@'gmaster';
drop user 'rangeradmin'@'gslave1';
drop user 'rangeradmin'@'gslave2';
FLUSH PRIVILEGES;
再创建用户(注意gmaster是ranger安装的服务器机器名)