Track and Process FATE Logs with KubeFATE Log Aggregation - FederatedAI/KubeFATE GitHub Wiki

KubeFATE's Log Aggregation

KubeFATE v1.5.1 supports the aggregation of logs for the FATE cluster. You can easily collate the logs for each component or a specific component of any FATE cluster through using the kubefate cluster logcommand.

Why Do We Need Log Aggregation?

In the life cycle of a program, logs often provide a wealth of information for O&M engineers or users to better control the program. Logs are usually recorded in log files, and cluster application logs are often scattered in different hosts, so it is very difficult to view them, rendering it necessary to collate and manage these logs.

KubeFATE's log aggregation feature addresses this problem head-on.

The Value of Log Aggregation

The log aggregation feature helps you easily accomplish the following:

Run Status Monitoring

The run status of a cluster application is a good indicator of a cluster's health For FATE, viewing logs is a good way to see whether a FATE cluster is running healthily.

Job Troubleshooting

Developing applications is inevitably accompanied with various errors, and it is essential to find problems and locate their causes in a timely manner.

KubeFATE's log aggregation feature is a useful troubleshooting tool for FATE users

Algorithm Debugging

FL AI compute jobs usually require the participation of more than one party, and the cluster instances of different parties are scattered in various complex environments, which brings a great challenge to the design and debugging for federated learning algorithms.

KubeFATE's log aggregation can collect all the logs for a single party, providing a powerful tool for debugging analysis.

Job Monitoring

The job officially produced by AI computing usually requires a huge amount of compute work, and takes a long time to run, so it is very important to ensure a cluster's running health while running, which can be easily monitored through viewing all the logs within the cluster.

Usage

KubeFATE's log aggregation feature greatly enhances the use of FATE. Let's see how it works.

Command

kubefate cluster logs [options] <cluster_ID>  [modules_name]

[options] are the options for commands

<cluster_ID> is the ID of the specified FATE cluster (required)

[modules_name] is the name of the module component corresponding to FATE

Options

options Description
--follow, -f Specify whether to use the streaming log. (default: false)
--previous If true, print the log of the previous instance (if any) of the container. (default: false)
--since value Return only logs within the last duration as specified (e.g., 5s, 2m, or 3h). Returns all logs by default if no value is specified. You can either use this or --since-time.
--since-time value Return logs only after a specified date (RFC3339 format). Returns all logs by default if no value is specified. You can either use this or --since.
--timestamps Include a timestamp in each line of log output. (default: false)
--tail value Display the most recent line in the log file. The default is -1, which means there is no selector, and all log lines will be displayed. (default : -1)
--limit-bytes value The maximum number of bytes of the log to be returned. The default is no limit.
--help show help (default: false)

Example

$ kubefate cluster list
UUID                                	NAME     	NAMESPACE	REVISION	STATUS 	CHART	ChartVERSION	AGE
8b980f0b-b139-40b2-a94d-d5aebd14d913	fate-9999	fate-9999	1       	Running	fate 	v1.5.1      	100s

View Log to Check Whether a Specific Component Runs

View the python component's log

kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 python

python_log

View the rollsite component's log

kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 rollsite

rollsite_log

Continuously Monitor a Component's Log

Monitor the python component's log

kubefate cluster logs -f 8b980f0b-b139-40b2-a94d-d5aebd14d913 python

python_flow_log

Continuously Monitor the Logs of All Components Within a Cluster

kubefate cluster logs -f 8b980f0b-b139-40b2-a94d-d5aebd14d913

all_flow_log

Monitor the Error Log

kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 | grep ERROR

error_log

View the Log of a Single Job

kubefate cluster logs b4db45a6-e9b5-4350-8be3-511ea72c76cf | grep <Job_ID>

job_log

⚠️ **GitHub.com Fallback** ⚠️