Simplifying user-logs management and access in YARN

User logs of Hadoop jobs serve multiple purposes. First and foremost, they can be used to debug issues while running a MapReduce application – correctness problems with the application itself, race conditions when running on a cluster, and debugging task/job failures due to hardware or platform bugs. Secondly, one can do historical analyses of the logs to see how individual tasks in job/workflow perform over time. One can even analyze the Hadoop MapReduce user-logs using Hadoop MapReduce(!) to determine any performance issues.

Handling of user-logs generated by applications has been one of the biggest pain-points for Hadoop installations in the past. In Hadoop 1.x, user-logs are left on individual nodes by the TaskTracker, and the management of the log files on local nodes is both insufficient for longer term analyses as well as non-deterministic for user access. YARN tackles this logs’ management issue by having the NodeManagers (NMs) provide the option of moving these logs securely onto a distributed file-system (FS), for e.g. HDFS, after the application completes.

Motivation

In Hadoop 1.x releases, MapReduce is the only programming model available for users. Each MapReduce job runs a bunch of mappers and reducers. Every map/reduce task generates logs directly on the local disk – syslog/stderr/stdout and optionally profile.out/debug.out etc. These task logs are accessible through the web UI, which is the only convenient way to view them – login into individual nodes to inspect logs is both cumbersome and sometimes not possible due to access restrictions.

State of the log-management art in Hadoop 1.x

For handling the users’ task-logs, the TaskTrackers (TTs) makes use of a UserLogManager. It is composed of the following components.

  • UserLogCleaner: Cleans logs of a given job after retaining them for a certain time starting from the job-finish. The configuration property mapred.userlog.retain.hours dictates this time, defaulting to one day. It adds every job’s user-log directory to a cleanup thread to delete them after user-log retain-hours. This thread wakes up every so often to delete any old logs that have hit their retention interval.
  • On top of that, all logs in ${mapred.local.dir}/userlogs are added to the UserLogCleaner once a TT restarts.
  • TaskLogsTruncater: This component truncates every log file beyond a certain length after a task finishes. The configuration properites mapreduce.cluster.map.userlog.retain-size and
    mapreduce.cluster.reduce.userlog.retain-size control how much of the userlog is to be retained. The assumption here is that when something goes wrong with a task, the tail of the log should generally indicate what the problem is and so we can afford to lose the head of the log when the log grows beyond some size.
  • mapred.userlog.limit.kb is another solution that predates the above: While a task JVM is running, its stdout and stderr streams are piped to the unix tail program which only retains the specified log size and writes them to stdout/stderr files.

There were a few more efforts that didn’t make to production

  • Log collection: a user invokable LogCollector that runs on client/gateway machines and collects per-job logs into a compact format.
  • Killing of running tasks exceeding N GB of logs because otherwise a run-away task can fill up the disk consisting logs, causing downtime.
Problems with existing log management:
  • Truncation: Users complain about truncated logs more often than that. Few users need to access the complete logs. No limit really satisfies all the users – 100KB works for some, but not so much for others.
  • Run-away tasks: TaskTrackers/DataNodes can still go out of disk space due to excessive logging as truncation only happens after tasks finish.
  • Insufficient retention: Users complain about the log-retention time. No admin configured limit satisfies all the users – the default of one day works for many users but many gripe about the fact that they cannot do post analysis.
  • Accesss: Serving logs over HTTP by the TTs is completely dependent on the job finish time and the retention time – not perfectly reliable.
  • Collection status: The same non-reliability with a log-collector tool – one needs to build more solutions to detect when the log-collection starts and finishes and when to switch over to such collected logs from the usual files managed by TTs.
  • mapred.userlog.limit.kb increases memory usage of each task (specifically with lots of logging), doesn’t work with more generic application containers supported by YARN.
  • Historical analysis: All of these logs are served over HTTP only as long as they exist on the local node – if users want to do post analysis, they will have to employ some log-collection and build more tooling around that.
  • Load-balancing & SPOF: All the logs are written to a single log-directory – no load balancing across disks & if that one disk is down, all jobs lose all logs.

Admitting that existing solutions were only really stop-gap to a more fundamental problem, we took the opportunity to do the right thing by enabling in-platform log aggregation.

Log-aggregation in YARN: Details

So, instead of truncating user-logs, and leaving them on individual nodes for certain time as done by the TaskTracker, the NodeManager addresses the logs’ management issue by providing the option to move these logs securely onto a file-system (FS), for e.g. HDFS, after the application completes.

  • Logs for all the containers belonging to a single Application and that ran on a given NM are aggregated and written out to a single (possibly compressed) log file at a configured location in the FS.
  • In the current implementation, once an application finishes, one will have
    • an application level log-dir and
    • a per-node log-file that consists of logs for all the containers of the application that ran on this node.
  • Users have access to these logs via YARN command line tools, the web-UI or directly from the FS – no longer just restricted to a web only interface.
  • These logs potentially can be stored for much longer times than was possible in 1.x, given they are stored a distributed file-system.
  • We don’t need to truncate logs to very small lengths – as long as the log sizes are reasonable, we can afford to store the entire logs.
  • In addition to that, while the containers are running, the logs are now written to multiple directories on each node for effective load balancing and improved fault tolerance.
  • AggregatedLogDeletionService is a service that periodically deletes aggregated logs. Today it is run inside the MapReduce JobHistoryServer only.

Usage

Web UI

On the web interfaces, the fact that logs are aggregated is completely hidden from the user.

  • While a MapReduce application is running, users can see the logs from the ApplicationMaster UI which redirects to the NodeManager UI
  • Once an application finishes, the completed information is owned by the MapReduce JobHistoryServer which again serves user-logs transparently.
  • For non-MapReduce applications, we are working on a generic ApplicationHistoryServer that does the same thing.
Command line access

In addition to the web-UI, we also have a command line utility to interact with logs.

$ $HADOOP_YARN_HOME/bin/yarn logs
Retrieve logs for completed YARN applications.
usage: yarn logs -applicationId <application ID> [OPTIONS]

general options are:
-appOwner <Application Owner>   AppOwner (assumed to be current user if
                                not specified)
-containerId <Container ID>     ContainerId (must be specified if node
                                address is specified)
-nodeAddress <Node Address>     NodeAddress in the format nodename:port
                                (must be specified if container id is
                                specified)

So, to print all the logs for a given application, one can simply say

yarn logs -applicationId <application ID>

On the other hand, if one wants to get the logs of only one container, the following works

yarn logs -applicationId <application ID> -containerId <Container ID> -nodeAddress <Node Address>

The obvious advantage with the command line utility is that now you can use the regular shell utils like grep, sort etc to filter out any specific information that one is looking for in the logs!

Administration

General log related configuration properties
  • yarn.nodemanager.log-dirs: Determines where the container-logs are stored on the node when the containers are running. Default is ${yarn.log.dir}/userlogs.
    • An application’s localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
    • Individual containers’ log directories will be below this, in directories named container_{$containerId}.
    • For MapReduce applications, each container directory will contain the files stderr, stdin, and syslog generated by that container.
    • Other frameworks can choose to write more or less files, YARN doesn’t dictate the file-names and number of files.
  • yarn.log-aggregation-enable: Whether to enable log aggregation or not. If disabled, NMs will keep the logs locally (like in 1.x) and not aggregate them.
Properties respected when log-aggregation is enabled
  • yarn.nodemanager.remote-app-log-dir: This is on the default file-system, usually HDFS and indictes where the NMs should aggregate logs to. This should not be local file-system, otherwise serving daemons like history-server will not able to serve the aggregated logs. Default is /tmp/logs.
  • yarn.nodemanager.remote-app-log-dir-suffix: The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}. Default value is “logs”".
  • yarn.log-aggregation.retain-seconds: How long to wait before deleting aggregated-logs, -1 or a negative number disables the deletion of aggregated-logs. One needs to be careful and not set this to a too small a value so as to not burden the distributed file-system.
  • yarn.log-aggregation.retain-check-interval-seconds: Determines how long to wait between aggregated-log retention-checks. If it is set to 0 or a negative value, then the value is computed as one-tenth of the aggregated-log retention-time. As with the previous configuration property, one needs to be careful and not set this to low values. Defaults to -1.
  • yarn.log.server.url: Once an application is done, NMs redirect web UI users to this URL where aggregated-logs are served. Today it points to the MapReduce specific JobHistory.
Properties respected when log-aggregation is disabled
  • yarn.nodemanager.log.retain-seconds: Time in seconds to retain user logs on the individual nodes if log aggregation is disabled. Default is 10800.
  • yarn.nodemanager.log.deletion-threads-count: Determines the number of threads used by the NodeManagers to clean-up logs once the log-retention time is hit for local log files when aggregation is disabled.
Other setup instructions
  • The remote root log directory is expected to have the permissions 1777 with ${NMUser} and directory and group-owned by ${NMGroup} – group to which NMUser belongs.
  • Each application level dir will be created with permission 770, but user-owned by the application-submitter and group owned by ${NMGroup}. This is so that the application-submitter can access aggregated-logs for his own sake and ${NMUser} can access or modify the files for log management.
  • ${NMGroup} should be a limited access group so that there are no access leaks.

Conclusion

In this post, I’ve described the motivations for implementing log-aggregation and how it looks for the end user as well as the administrators. Log-aggregation proved to be a very useful feature so far. There are interesting design decisions that we made and some unsolved challenges with the existing implementation – topics for a future post.

Ref: http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值