Common Files (backend.common) - DSGT-DLP/Deep-Learning-Playground GitHub Wiki
About
This page contains documentation for the files in the /backend/common
directory of the Github repo. This page is regularly updated when new changes are added to these files.
constants.py
)
Constants Storage (This file stores a number of constants used throughout the backend directory for various purposes.
dataset.py
)
General Dataset Handler (This file handles the part where the dataset is read in for training. The user can upload a dataset in the following ways (right now):
- As a URL to the raw csv file
- Directly upload the dataset to the website
- Load in a zipped file (For image data at least right now)
Endpoints
read_dataset(url)
: Given URL, generate csv file that's locally stored in order to train the deep learning modelread_local_csv_file(file_path)
: Given file path to csv (from user uploading the file), read it inloader_from_zipped(zipped_file, train_transform, test_transform)
: Given a path to zip file, read it in. The zip file structure is explained in "Pretrained Models" section of the page.train_transform
are set of data transformations to apply to the/train
folder in the zip file whiletest_transform
are set of data transformations to apply to the/test
folder in the zip file
Usage
In terms of usage, it's pretty straightforward. Simply invoke the function with the proper parameters and you should be all set. However, we don't actually have the developers directly use this file as other endpoints in the backend call functions from this file on the user's behalf.
train_loader, valid_loader = loader_from_zipped(
"../tests/zip_files/double_zipped.zip",
train_transform=[transforms.ToTensor(),
transforms.RandomChoice(transforms=[transforms.ToTensor()])] # do NOT add transforms.Compose. Enter valid transformation sequence
)
default_datasets.py
)
Default Data Handler (Sometimes, the user simply wants to play around with deep learning models without the need to upload a dataset. Our application supplies some default datasets that the user can choose from as an alternative. When the user selects the "default dataset name" from the frontend dropdown (ie: boston housing, california housing, wine, iris, etc), we use sklearn.datasets
to read in the selected default dataset and return a pd.DataFrame
for the dl_trainer
endpoint to use.
Example Usage
get_default_dataset("iris")
email_notifier.py
)
Email Notifier (Endpoint that takes in email address, subject, body, attachment and sends email notification to user. This file actually does the invoking of our API Gateway endpoint which connects to AWS Lambda + AWS SES. When the user enters an email account in the website and model training happens successfully, our driver function will call this function/route.
loss_functions.py
)
Loss Functions (Endpoint that contains the compute_loss()
function in order to compute the loss between predicted vs. actual for a given epoch. Measuring train and test loss is critical to see the progression of the model being trained.
LossFunctions(Enum)
is an enum that contains our collection of loss functions. You can add new loss functions as shown in the implementation.
optimizer.py
)
Optimizers (Collection of optimizers that the user can access based on what they specify from the admin website
utils.py
)
Utility Functions (This file contains getter functions and functions used to generate data visualizations
generate_confusion_matrix(labels_last_epoch, y_pred_last_epoch)
- This function generates a confusion matrix based on labels and prediction results returned from the last epoch of training
- Confusion matrix are only generated for classification problems
train_deep_classification_model()
fromdl_trainer.py
calls this function- This function doesn't return anything, it will save the generated plot as png to a designated directory which will be used to display in the frontend and emailed to the user
generate_AUC_ROC_CURVE(labels_last_epoch, y_pred_last_epoch)
- This function generates AUC/ROC curves based on labels and prediction results returned from the last epoch of training
- AUC/ROC curves are only generated for classification problems
- This works for multi-class classification as well, it uses a one-vs-all approach so we will have one curve for each class
train_deep_classification_model()
fromdl_trainer.py
calls this function- This function returns raw data for the curves which will be passed to frontend to generate an interactive graph
- The graph is also generated in the backend in addition to the graph generated in the frontend, this is to ensure that the graph can be emailed as png, which needs to be done in the backend