Steps for project migration - cloudera/cmlutils GitHub Wiki
Development mode
- Clone the repo and run
python3 -m pip install --editable .. - Check if the command
cmlutilis running or not. - By installing the CLI in editable mode, any changes done to the source code would reflect in real-time without the need for re-installing again.
For production
- To install from
mainbranch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@main
- Or from a feature or release branch:
python3 -m pip install git+https://github.com/cloudera/cmlutils@<branch-name>
- Please carefully go through Legacy Engine Migration
- Check if user exists and is authorised to migrate the project
- Rsync Custom Runtime is available in Source Runtime Catalog
- Check if the Intermediate/Bastion machine has sufficient disk space available to download the project.
- Create
export-config.inifile inside<home-dir>/.cmlutilsdirectory. Inside theexport-config.inifile, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section. - Example export-config.ini file:
[DEFAULT]
url=<Source-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-source.pem
username=user-default
apiv1_key=default-dummy-key
[Project-A]
username=user-1
apiv1_key=user1-api-key
[Project-B]
username=user-2
apiv1_key=user-2-api-key
[Project-C] ; Uses [DEFAULT] configuration as it doesn't have specific configuration
[Project-D owned by another user]
owner_username=<project-owner-username> ; optional. Required only when the workbench project is owned by someone else (user or team) other than the cmlutil user.
[Project-E with insecure connection]
skip_tls_verification=true ; optional. Set to true to skip TLS certificate verification for untrusted/self-signed certificates.Configuration used:
- username: Username of the user who is migrating the project (the cmlutil user). (Mandatory)
- url: Source workspace URL. (Mandatory)
- apiv1_key: Source API v1/Legacy API key. (Mandatory)
- output_dir: Temporary directory on the local machine where the project data/metadata would be stored. (Mandatory)
- ca_path: Path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and SSL certificate verification fails. Issue is generally seen with MacOS. (Optional)
- owner_username: Username of the actual project owner (if different from the cmlutil user). Use this when migrating projects owned by other users or teams. The cmlutil user must have write access to the project. The original owner information is preserved and will be restored during import if the user exists in the destination workspace. (Optional)
- skip_tls_verification: Set to
trueto skip TLS/SSL certificate verification. Useful for workspaces with self-signed or untrusted certificates. Can be set globally in DEFAULT section or per-project. (Optional)
-
If you wish to skip certain files or directories during export, create
.exportignorefile at the root of Source project (i.e. /home/cdsw). The.exportignorefile follows the same semantics as that of.gitgnore. -
To export the project run the following command:
cmlutil project export -p "Project-A"
or
cmlutil project export -p "Project-C"
Note: Project-name above should match one of the section names in the export-config.ini file.
- Folder with the project name will be created inside the output directory
(~/Documents/temp_dir). If the project folder already exists, then the data will be overwritten. - All the project files, artifacts and logs corresponding to the project will be downloaded in the project folder.
- Export metrics JSON will be created that will have info related to the exported project
- Check if user exists and is authorised to migrate the project
- Rsync Custom Runtime is available in Target Runtime Catalog
- Check if local output directory and project metadata file exists on the Intermediate/Bastion machine.
- Create
import-config.inifile inside<home-dir>/.cmlutilsdirectory. Inside theimport-config.inifile, create a section for each project, where you can include project-specific configurations. For common configurations shared across projects, place them in the DEFAULT section.
Example file:
[DEFAULT]
url=<Destination-Workspace-url>
output_dir=~/Documents/temp_dir
ca_path=~/Documents/custom-ca-target.pem
username=user-default
apiv1_key=user-default-dummy-key
[Project-A]
username=user-1
apiv1_key=user-1-api-key
[Project-B]
username=user-2
apiv1_key=user-2-api-key
[Project-C] ; Uses [DEFAULT] configuration as it doesn't have specific configuration
[Project-D with insecure connection]
skip_tls_verification=true ; optional. Set to true to skip TLS certificate verification for untrusted/self-signed certificates.Configuration used:
- username: Username of the user who is migrating the project. (Mandatory)
- url: Target workspace URL. (Mandatory)
- apiv1_key: Target API v1/Legacy API key. (Mandatory)
- output_dir: Temporary directory on the local machine from where the project will be uploaded. (Mandatory)
- ca_path: Path to a CA (Certifying Authority) bundle to use, in case python is not able to pick up CA from the system and SSL certificate verification fails. Issue is generally seen with MacOS. (Optional)
- skip_tls_verification: Set to
trueto skip TLS/SSL certificate verification. Useful for workspaces with self-signed or untrusted certificates. Can be set globally in DEFAULT section or per-project. (Optional)Note on Ownership Transfer: During import, if the project was originally owned by a different user (as recorded during export), the tool will attempt to transfer ownership to that user in the destination workspace. If the original owner doesn't exist in the destination workspace, the tool will log a message and the project will remain owned by the cmlutil user.
-
If you encounter read-only file system errors during import (e.g., for
.snapshotdirectories), create.importignorefile in<project-name>/project-data/directory and add problematic paths. The.importignorefile follows the same semantics as that of.gitgnore. -
To import the project run the following command:
cmlutil project import -p "Project-A"
or
cmlutil project import -p "Project-B"
Note: Project-name above should match one of the section names in the import-config.ini file.
- The project will be created in the destination workspace if it does not exist already.
- Import metrics JSON will be created that will have info related to the imported project
- To import a project and initiate validation, execute the following commands:
cmlutil project import -p "Project-A" -v
or
cmlutil project import -p "Project-B" --verify
This command initiates a session in the source and validates the following aspects:
- Consistency of project files between the source and local directories.
- Consistency of project files between the local directory and the destination.
- Consistency in the count of Jobs, Models, and Applications between the source and destination.
- Consistency in the metadata of Jobs, Models, and Applications between the source and destination.
These validations ensure the integrity and accuracy of the project import process.