yarn 日志聚合的相关配置
yarn.log-aggregation-enable
share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
默认值为false
<property>
<description>Whether to enable log aggregation. Log aggregation collects
each container's logs and moves these logs onto a file-system, for e.g.
HDFS, after the application completes. Users can configure the
"yarn.nodemanager.remote-app-log-dir" and
"yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
where these logs are moved to. Users can access the logs via the
Application Timeline Server.
</description>
<name>yarn.log-aggregation-enable</name>
<value>false</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<description>The remote log dir will be created at
{yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
</description>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
解释:
注意聚合的目录是hdfs文件系统
yarn.log-aggregation-enable true 执行结束后收集(聚合)各个container本地的日志
yarn.nodemanager.remote-app-log-dir /app-logs 聚合日志后在hdfs的存放地址
yarn.nodemanager.remote-app-log-dir-suffix logs 聚合日志存放的后缀,存放地址由 ${remote-app-log-dir}/${user}/{thisParam}构成
Yarn executor 运行时的日志情况
运行时候的日志文件存放于yarn.nodemanager.log−dirs/{ApplicationID}
<property>
<description>
Where to store container logs. An application's localized log directory
will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers' log directories will be below this, in directories
named container_{$contid}. Each container directory will contain the files
stderr, stdin, and syslog generated by that container.
</description>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property>
如果这样配置
<property>
<!--应用执行时存储路径-->
<name>yarn.nodemanager.log-dirs</name>
<value>file:/mnt/ddb/2/hadoop/nm</value>
</property>
运行时候的executor日志存放于:
root@xxx:/mnt/ddb/2/hadoop/nm/application_1471515078641_0007# ls
container_1471515078641_0007_01_000001 container_1471515078641_0007_01_000002 container_1471515078641_0007_01_000003
注:其中container_1471515078641_0007_01_000001为RM为application_1471515078641_0007分配的第一个container,即AM所在的container,
executor 运行结束以后日志会聚合到HDFS上
默认存入于/tmp/logs/${user}/logs 下
drwxrwx--- - root supergroup 0 2016-08-18 18:29 /tmp/logs/root/logs/application_1471515078641_0002
drwxrwx--- - root supergroup 0 2016-08-18 19:10 /tmp/logs/root/logs/application_1471515078641_0003
drwxrwx--- - root supergroup 0 2016-08-18 19:17 /tmp/logs/root/logs/application_1471515078641_0004
例如,在默认配置下,
Spark 产生的 container 日志,
hadoop fs -get /tmp/logs/root/logs/application_1653740407738_0020 ./
运行结束后,可以通过以下命令查看日志:
yarn logs --applicationId <id>
可参见 https://www.cnblogs.com/caoweixiong/p/13634188.html
抓取 yarn am 的日志
yarn logs -am -applicationId application_1480922439133_0845_02
默认配置Value
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value>
</property>
<property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
hadoop/logs/yarn-root-nodemanager-{hostname}.log
2020-10-29 13:29:42,852 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1600421369116_0008_01_000003
2020-10-29 13:29:42,911 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56539 for container-id container_1600421369116_0008_01_000002: 377.0 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:42,951 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56643 for container-id container_1600421369116_0008_01_000003: 371.7 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:42,980 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56413 for container-id container_1600421369116_0008_01_000001: 368.9 MB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used
2020-10-29 13:29:46,040 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56539 for container-id container_1600421369116_0008_01_000002: 377.0 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:46,082 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56643 for container-id container_1600421369116_0008_01_000003: 367.2 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:46,111 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56413 for container-id container_1600421369116_0008_01_000001: 368.9 MB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used
2020-10-29 13:29:46,885 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1600421369116_0008_01_000004
2020-10-29 13:29:49,175 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56539 for container-id container_1600421369116_0008_01_000002: 377.0 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:49,219 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56643 for container-id container_1600421369116_0008_01_000003: 367.2 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
2020-10-29 13:29:49,250 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 56413 for container-id container_1600421369116_0008_01_000001: 369.0 MB of 1 GB physical memory used; 2.7 GB of 2.1 GB virtual memory used
367.2 MB of 2 GB physical memory used; 3.1 GB of 4.2 GB virtual memory used
表示:
申请到了2GB,其中使用了367.2 MB。
2GB╳2.1=4.2GB ,4.2GB虚拟内存中使用了3.1GB
如果 yarn.nodemanager.vmem-check-enabled 此时还是使用 默认值 true, 该container virtual memory 超过 4.2 将会被killed 掉
yarn 缓存目录
yarn/local/usercache/hadoop/filecache]$ ll
total 0
drwxr-xr-x 3 hadoop hadoop 59 Apr 1 14:45 10
drwxr-xr-x 3 hadoop hadoop 40 Apr 1 14:45 11
drwxr-xr-x 3 hadoop hadoop 59 Apr 1 14:50 12
drwxr-xr-x 3 hadoop hadoop 59 Apr 1 19:56 14
drwxr-xr-x 3 hadoop hadoop 40 Apr 1 19:57 17
drwxr-xr-x 3 hadoop hadoop 59 Apr 1 20:00 18
drwxr-xr-x 3 hadoop hadoop 40 Apr 3 11:10 23
drwxr-xr-x 3 hadoop hadoop 40 Apr 3 11:10 25
drwxr-xr-x 3 hadoop hadoop 59 Apr 12 10:47 28
drwxr-xr-x 3 hadoop hadoop 40 Apr 12 16:37 33
yarn/local/usercache/hadoop/filecache/33]$ ll
total 0
drwx------ 3 hadoop hadoop 183 Apr 12 16:37 __spark_conf__.zip
NodeManager 采用轮询的分配策略将这三类资源存放在 yarn.nodemanager.local-dirs 指定的目录列表中,在每个目录中,资源按照以下方式存放:
- PUBLIC 资源:存放在 ${yarn.nodemanager.local-dirs}/filecache/ 目录下,每个资源将单独存放在以一个随机整数命名的目录中,且目录的访问权限均为 0755。
- PRIVATE 资源:存放在 y a r n . n o d e m a n a g e r . l o c a l − d i r s / u s e r c a c h e / {yarn.nodemanager.local-dirs}/usercache/ yarn.nodemanager.local−dirs/usercache/{user}/filecache/ 目录下,每个资源将单独存放在以一个随机整数命名的目录中,且目录的访问权限均为 0710。
- APPLICATION 资源:存放在 y a r n . n o d e m a n a g e r . l o c a l − d i r s / u s e r c a c h e / {yarn.nodemanager.local-dirs}/usercache/ yarn.nodemanager.local−dirs/usercache/{user}/ a p p c a c h e / {appcache}/ appcache/{appid}/filecache/ 目录下,每个资源将单独存放在以一个随机整数命名的目录中,且目录的访问权限均为 0710。
其中 Container 的工作目录位于 y a r n . n o d e m a n a g e r . l o c a l − d i r s / u s e r c a c h e / {yarn.nodemanager.local-dirs}/usercache/ yarn.nodemanager.local−dirs/usercache/{user}/ a p p c a c h e / {appcache}/ appcache/{appid}/${containerid} 目录下,其主要保存 jar 包文件、字典文件对应的软链接。
./nm-local-dir/
|-- filecache // PUBLIC资源
| `-- 10 // 每个资源将单独存放在以一个随机整数命名的目录中
|-- nmPrivate
| |-- application_xxxx_xxx
| | |-- container_xxx_xxx_xxx_xx_xxxx
| | |-- container_xxx_xxx_xxx_xx_xxxx // 私有目录数据(执行脚本、token文件、pid文件)
| | | |-- container_xxx_xxx_xxx_xx_xxxx.pid
| | | |-- container_xxx_xxx_xxx_xx_xxxx.tokens
| | | `-- launch_container.sh
| |-- application_xxxx_xxx
| `-- application_xxxx_xxx
`-- usercache
|-- userXxx
| |-- appcache // APPLICATION资源
| `-- filecache // PRIVATE资源
|-- userXxx
| |-- appcache
| `-- filecache
可参考:
https://www.cnblogs.com/shuofxz/p/17383011.html