Track and Process FATE Logs with KubeFATE Log Aggregation - FederatedAI/KubeFATE GitHub Wiki
KubeFATE v1.5.1 supports the aggregation of logs for the FATE cluster. You can easily collate the logs for each component or a specific component of any FATE cluster through using the kubefate cluster log
command.
In the life cycle of a program, logs often provide a wealth of information for O&M engineers or users to better control the program. Logs are usually recorded in log files, and cluster application logs are often scattered in different hosts, so it is very difficult to view them, rendering it necessary to collate and manage these logs.
KubeFATE's log aggregation feature addresses this problem head-on.
The log aggregation feature helps you easily accomplish the following:
The run status of a cluster application is a good indicator of a cluster's health For FATE, viewing logs is a good way to see whether a FATE cluster is running healthily.
Developing applications is inevitably accompanied with various errors, and it is essential to find problems and locate their causes in a timely manner.
KubeFATE's log aggregation feature is a useful troubleshooting tool for FATE users
FL AI compute jobs usually require the participation of more than one party, and the cluster instances of different parties are scattered in various complex environments, which brings a great challenge to the design and debugging for federated learning algorithms.
KubeFATE's log aggregation can collect all the logs for a single party, providing a powerful tool for debugging analysis.
The job officially produced by AI computing usually requires a huge amount of compute work, and takes a long time to run, so it is very important to ensure a cluster's running health while running, which can be easily monitored through viewing all the logs within the cluster.
KubeFATE's log aggregation feature greatly enhances the use of FATE. Let's see how it works.
kubefate cluster logs [options] <cluster_ID> [modules_name]
[options] are the options for commands
<cluster_ID> is the ID of the specified FATE cluster (required)
[modules_name] is the name of the module component corresponding to FATE
options | Description |
---|---|
--follow, -f | Specify whether to use the streaming log. (default: false) |
--previous | If true, print the log of the previous instance (if any) of the container. (default: false) |
--since value | Return only logs within the last duration as specified (e.g., 5s, 2m, or 3h). Returns all logs by default if no value is specified. You can either use this or --since-time . |
--since-time value | Return logs only after a specified date (RFC3339 format). Returns all logs by default if no value is specified. You can either use this or --since . |
--timestamps | Include a timestamp in each line of log output. (default: false) |
--tail value | Display the most recent line in the log file. The default is -1, which means there is no selector, and all log lines will be displayed. (default : -1) |
--limit-bytes value | The maximum number of bytes of the log to be returned. The default is no limit. |
--help | show help (default: false) |
$ kubefate cluster list
UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE
8b980f0b-b139-40b2-a94d-d5aebd14d913 fate-9999 fate-9999 1 Running fate v1.5.1 100s
View the python component's log
kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 python
View the rollsite component's log
kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 rollsite
Monitor the python component's log
kubefate cluster logs -f 8b980f0b-b139-40b2-a94d-d5aebd14d913 python
kubefate cluster logs -f 8b980f0b-b139-40b2-a94d-d5aebd14d913
kubefate cluster logs 8b980f0b-b139-40b2-a94d-d5aebd14d913 | grep ERROR
kubefate cluster logs b4db45a6-e9b5-4350-8be3-511ea72c76cf | grep <Job_ID>