一、跑etl任务,报错:
Container [pid=31306,containerID=container_1479290736535_0004_01_000003] is running beyond physical memory limits. Current usage: 2.7 GB of 2.5 GB physical memory used; 4.4 GB of 7.5 GB virtual memory used. Killing container.

发现是一个小时任务打开了太多文件(大约7000多个,每个进程占用的buffer太多,导致内存不足)

解决办法:
修改程序设置,
"orc.strip.size":1024*1024,
"orc.block.size":16*1024*1024,
"orc.row.index.stride":500,
"orc.compress.size":8*1024,


二、nodemanager自动关闭又恢复的问题
2016-08-19 14:57:19,027 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /opt/amos/data/hadoop/yarn-local error, used space above threshold of 90.0%, removing from list of valid directories
2016-08-19 14:57:19,027 WARN org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection: Directory /opt/amos/data/hadoop/yarn-log error, used space above threshold of 90.0%, removing from list of valid directories

10分钟后nodemanager又恢复了,什么问题呢?
[root@~]# vim /opt/amos/conf/hadoop/yarn-site.xml 
<property>
<description>Interval in between cache cleanups.</description>
<name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
<value>600000</value>
</property>

本地磁盘才30G,yarn的cache设置为40G,导致问题出现。
[root@~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 25G 3.9G 87% /

解决方法:把yarn的缓存设置为20G
[root@~]# vim /opt/amos/conf/hadoop/yarn-site.xml 
<property>
<name>yarn.nodemanager.localizer.cache.target-size-mb</name>
<value>20480</value>
</property>


三、hadoop 反复提交任务,但是失败,原因是需要将map和reduce使用的内存大小修改为jvm的1.5倍数

mapred.child.java.opts -Xmx2048M -Xms8M             2048*1.5=3172

mapreduce.map.memory.mb 1536 mapred-site.xml     改为3172

mapreduce.reduce.memory.mb 2048                         改为3172