Using the SageWorks Python API - SuperCowPowers/workbench GitHub Wiki
After Onboarding SageWorks to your AWS Account AWS Onboarding you are now ready to have your Data Science team start doing Science!
SSO Set up the AWS CLI/Python Usage
Note: There's no special setup for SageWorks. If you already have this setup/working, you can skip. If you haven't already setup your local/laptop with an AWS PROFILE/SSO User then please see our guide AWS SSO Setup.
Note: That you only need to do this setup once and then you're ready to go from then on.
Python Setup: Virtual Environments
SageWorks requires Python 3.10 or higher. PyEnv is great but feel free to use any Python Environment (Anaconda, VirtualEnv, PyEnv) that you'd like.
Installing SageWorks
pip install sageworks
Set the SageWorks Artifacts Bucket ENV Var
Note: You may want to put this ENV var in your ~/.bashrc or ~/.zshrc or Windows Environments
export SAGEWORKS_BUCKET=mycompany-sageworks-bucket (or whatever this bucket is called)
set SAGEWORKS_BUCKET=mycompany-sageworks-bucket # on Windows
$Env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket" # on Windows (PowerShell)
Windows PowerShell Instructions (for Anaconda Installs)
For window put this in any folder and add it to your system path.
$env:AWS_PROFILE = "my-aws-profile"
$env:SAGEWORKS_BUCKET = "mycompany-sageworks-bucket"
aws sso login --profile my-aws-profile
Testing out the AWS Connection
- Make sure your AWS_PROFILE is set correctly
- Renew your SSO Token
- Try it out
$ ipython
In [1]: from sageworks.views.artifacts_text_view import ArtifactsTextView
In [2]: ArtifactsTextView().summary()
===============================================================================================================
GLUE_JOBS
===============================================================================================================
Name GlueVersion Workers WorkerType Modified LastRun Status
NSM_Log_Loader 4.0 4 G.1X 2023-08-14 17:09 2023-09-02 02:48 SUCCEEDED
dns_load_heavy 4.0 4 G.1X 2023-06-06 16:10 2023-06-06 16:10 SUCCEEDED
===============================================================================================================
DATA_SOURCES
===============================================================================================================
Name Ver Size(MB) Modified Num Columns DataLake Tags Input
abalone_data 25 0.07 2023-09-22 22:40 9 False abalone:public /Users/briford/work/sageworks/data/abalone.csv
abalone_data_copy 20 0.07 2023-09-24 01:54 9 False abalone:public abalone_data
test_data 10 0.01 2023-09-22 23:34 10 False test:small DataFrame
===============================================================================================================
FEATURE_SETS
===============================================================================================================
Feature Group Size(MB) Catalog DB Athena Table Online Created Tags Input
test_feature_set 0.47 sagemaker_featurestore test_feature_set_1695520074 True 2023-09-24 01:47 test:small test_data
abalone_feature_set 0.59 sagemaker_featurestore abalone_feature_set_1695423652 True 2023-09-22 23:00 abalone:public abalone_data
===============================================================================================================
MODELS
===============================================================================================================
Model Group Ver Status Description Created Tags Input
abalone-regression 1 Completed Abalone Regression Model 2023-09-22 23:15 abalone:regression abalone_feature_set
===============================================================================================================
ENDPOINTS
===============================================================================================================
Name Status Created DataCapture Sampling(%) Tags Input
abalone-regression-end InService 2023-09-22 23:15 False - abalone:regression abalone-regression
Reduce logging verbosity
SageWorks is currently quite verbose in it's logging, so if you want to make it a bit more quiet you can reduce the logging with this bit of code:
import logging
logging.getLogger("sageworks").setLevel(logging.WARNING)
Errors
If you get an error like this it means that your AWS_PROFILE needs to be set or your need to renew your SSO Token
RuntimeError: AWS Identity Check Failure: Check AWS_PROFILE and/or Renew SSO Token...
Jupyter Notebooks
You can obviously run any of the SageWorks API in a Jupyter Notebook, so feel free to look at the Tutorials below. Also if you want to set an ENV var in a notebook you can just do
import os
os.environ['SAGEWORKS_BUCKET'] = 'mycompany-sageworks-bucket'
Using SageWorks Tutorials
- Notebook: SageWorks Pipeline Building an AWS® ML Pipeline from start to finish.
- Video: Coding with SageWorks Informal coding + chatting while building a full ML pipeline.
Redis (Optional)
SageWorks uses an OPTIONAL Redis database as a temporal cache to minimize AWS Service calls if you have access to a Redis database set that ENV Var
export REDIS_HOST=<your redis host>
You can also spin up a local docker image super easy but again this is optional, it will make SageWorks more responsive but is not required.
docker run --name my-redis -p 6379:6379 -d redis