Configuring Logging 配置日志记录
This guide helps you understand and modify the configuration of Ray’s logging system.
本指南可帮助您了解和修改 Ray 日志记录系统的配置。
1 Logging directory
1 日志记录目录
By default, Ray log files are stored in a /tmp/ray/session_*/logs
directory. View the log files in logging directory below to understand how they are organized within the logs folder.
默认情况下,Ray 日志文件存储在目录 /tmp/ray/session_*/logs
中。查看下面的日志记录目录中的日志文件,了解它们在日志文件夹中的组织方式。
Note 注意
Ray uses /tmp/ray
(for Linux and macOS) as the default temp directory. To change the temp and the logging directory, specify it when you call ray start
or ray.init()
.
Ray 使用 /tmp/ray
(适用于 Linux 和 macOS)作为默认临时目录。若要更改 temp 和日志记录目录,请在调用 ray start
或 ray.init()
时指定它。
A new Ray session creates a new folder to the temp directory. The latest session folder is symlinked to /tmp/ray/session_latest
. Here is an example temp directory:
新的 Ray 会话会在临时目录中创建一个新文件夹。最新的会话文件夹符号链接到 /tmp/ray/session_latest
。下面是一个示例临时目录:
├── tmp/ray
│ ├── session_latest
│ │ ├── logs
│ │ ├── ...
│ ├── session_2023-05-14_21-19-58_128000_45083
│ │ ├── logs
│ │ ├── ...
│ ├── session_2023-05-15_21-54-19_361265_24281
│ ├── ...
Usually, temp directories are cleared up whenever the machines reboot. As a result, log files may get lost whenever your cluster or some of the nodes are stopped or terminated.
通常,每当计算机重新启动时,都会清除临时目录。因此,每当您的集群或某些节点停止或终止时,日志文件都可能丢失。
If you need to inspect logs after the clusters are stopped or terminated, you need to store and persist the logs. View the instructions for how to process and export logs for clusters on VMs and KubeRay Clusters.
如果需要在集群停止或终止后检查日志,则需要存储和持久化日志。查看有关如何处理和导出虚拟机和 KubeRay 集群上的集群日志的说明。
2 Log files in logging directory
2 日志记录目录中的日志文件
Below are the log files in the logging directory. Broadly speaking, two types of log files exist: system log files and application log files. Note that .out
logs are from stdout/stderr and .err
logs are from stderr. The backward compatibility of log directories is not guaranteed.
以下是日志记录目录中的日志文件。从广义上讲,存在两种类型的日志文件:系统日志文件和应用程序日志文件。请注意, .out
日志来自 stdout/stderr, .err
日志来自 stderr。不保证日志目录的向后兼容性。
Note 注意
System logs may include information about your applications. For example, runtime_env_setup-[job_id].log
may include information about your application’s environment and dependency.
系统日志可能包含有关应用程序的信息。例如, runtime_env_setup-[job_id].log
可能包含有关应用程序环境和依赖项的信息。
Application logs 应用程序日志
-
job-driver-[submission_id].log
: The stdout of a job submitted with the Ray Jobs API.job-driver-[submission_id].log
:使用 Ray Jobs API 提交的作业的 stdout。 -
worker-[worker_id]-[job_id]-[pid].[out|err]
: Python or Java part of Ray drivers and workers. All stdout and stderr from Tasks or Actors are streamed to these files. Note that job_id is the ID of the driver.worker-[worker_id]-[job_id]-[pid].[out|err]
:Ray 驱动程序和 worker 的 Python 或 Java 部分。来自 Tasks 或 Actor 的所有 stdout 和 stderr 都流式传输到这些文件。请注意,job_id 是驱动程序的 ID。
System (component) logs
系统(组件)日志#
-
dashboard.[log|err]
: A log file of a Ray Dashboard..log
files contain logs generated from the dashboard’s logger..err
files contain stdout and stderr printed from the dashboard. They are usually empty except when the dashboard crashes unexpectedly.dashboard.[log|err]
:Ray Dashboard 的日志文件。.log
文件包含从仪表板的记录器生成的日志。.err
文件包含从仪表板打印的 stdout 和 stderr。它们通常是空的,除非仪表板意外崩溃。 -
dashboard_agent.log
: Every Ray node has one dashboard agent. This is a log file of the agent.dashboard_agent.log
:每个 Ray 节点都有一个仪表板代理。这是代理的日志文件。 -
gcs_server.[out|err]
: The GCS server is a stateless server that manages Ray cluster metadata. It exists only in the head node.gcs_server.[out|err]
:GCS 服务器是管理 Ray 集群元数据的无状态服务器。它仅存在于头节点中。 -
io-worker-[worker_id]-[pid].[out|err]
: Ray creates IO workers to spill/restore objects to external storage by default from Ray 1.3+. This is a log file of IO workers.io-worker-[worker_id]-[pid].[out|err]
:默认情况下,Ray 会创建 IO 工作线程,以从 Ray 1.3+ 中溢出/恢复到外部存储。这是 IO worker 的日志文件。 -
log_monitor.[log|err]
: The log monitor is in charge of streaming logs to the driver..log
files contain logs generated from the log monitor’s logger..err
files contain the stdout and stderr printed from the log monitor. They are usually empty except when the log monitor crashes unexpectedly.log_monitor.[log|err]
:日志监视器负责将日志流式传输到驱动程序。.log
文件包含从日志监视器的记录器生成的日志。.err
文件包含从日志监视器打印的 stdout 和 stderr。它们通常为空,除非日志监视器意外崩溃。 -
monitor.[out|err]
: Stdout and stderr of a cluster launcher.monitor.[out|err]
:集群启动器的 Stdout 和 stderr。 -
monitor.log
: Ray’s Cluster Launcher operates from a monitor process. It also manages the Autoscaler.monitor.log
:Ray 的 Cluster Launcher 从监控进程运行。它还管理自动缩放程序。 -
plasma_store.[out|err]
: Deprecated.plasma_store.[out|err]
:荒废的。 -
python-core-driver-[worker_id]_[pid].log
: Ray drivers consist of CPP core and a Python or Java frontend. CPP code generates this log file.python-core-driver-[worker_id]_[pid].log
:Ray 驱动程序由 CPP 核心和 Python 或 Java 前端组成。CPP 代码生成此日志文件。 -
python-core-worker-[worker_id]_[pid].log
: Ray workers consist of CPP core and a Python or Java frontend. CPP code generates this log file.python-core-worker-[worker_id]_[pid].log
:Ray worker 由 CPP 核心和 Python 或 Java 前端组成。CPP 代码生成此日志文件。 -
raylet.[out|err]
: A log file of raylets.raylet.[out|err]
:raylet 的日志文件。 -
redis-shard_[shard_index].[out|err]
: Redis shard log files.redis-shard_[shard_index].[out|err]
:Redis 分片日志文件。 -
redis.[out|err]
: Redis log files.redis.[out|err]
:Redis 日志文件。 -
runtime_env_agent.log
: Every Ray node has one agent that manages Runtime Environment creation, deletion, and caching. This is the log file of the agent containing logs of create or delete requests and cache hits and misses. For the logs of the actual installations (for example,pip install
logs), see theruntime_env_setup-[job_id].log
file (se