Common Files (backend.common) - DSGT-DLP/Deep-Learning-Playground GitHub Wiki

About

This page contains documentation for the files in the /backend/common directory of the Github repo. This page is regularly updated when new changes are added to these files.

Constants Storage (constants.py)

This file stores a number of constants used throughout the backend directory for various purposes.

General Dataset Handler (dataset.py)

This file handles the part where the dataset is read in for training. The user can upload a dataset in the following ways (right now):

  • As a URL to the raw csv file
  • Directly upload the dataset to the website
  • Load in a zipped file (For image data at least right now)

Endpoints

  • read_dataset(url): Given URL, generate csv file that's locally stored in order to train the deep learning model
  • read_local_csv_file(file_path): Given file path to csv (from user uploading the file), read it in
  • loader_from_zipped(zipped_file, train_transform, test_transform): Given a path to zip file, read it in. The zip file structure is explained in "Pretrained Models" section of the page. train_transform are set of data transformations to apply to the /train folder in the zip file while test_transform are set of data transformations to apply to the /test folder in the zip file

Usage

In terms of usage, it's pretty straightforward. Simply invoke the function with the proper parameters and you should be all set. However, we don't actually have the developers directly use this file as other endpoints in the backend call functions from this file on the user's behalf.

train_loader, valid_loader = loader_from_zipped(
    "../tests/zip_files/double_zipped.zip", 
    train_transform=[transforms.ToTensor(),
                        transforms.RandomChoice(transforms=[transforms.ToTensor()])] # do NOT add transforms.Compose. Enter valid transformation sequence
)

Default Data Handler (default_datasets.py)

Sometimes, the user simply wants to play around with deep learning models without the need to upload a dataset. Our application supplies some default datasets that the user can choose from as an alternative. When the user selects the "default dataset name" from the frontend dropdown (ie: boston housing, california housing, wine, iris, etc), we use sklearn.datasets to read in the selected default dataset and return a pd.DataFrame for the dl_trainer endpoint to use.

Example Usage

get_default_dataset("iris")

Email Notifier (email_notifier.py)

Endpoint that takes in email address, subject, body, attachment and sends email notification to user. This file actually does the invoking of our API Gateway endpoint which connects to AWS Lambda + AWS SES. When the user enters an email account in the website and model training happens successfully, our driver function will call this function/route.

Loss Functions (loss_functions.py)

Endpoint that contains the compute_loss() function in order to compute the loss between predicted vs. actual for a given epoch. Measuring train and test loss is critical to see the progression of the model being trained.

LossFunctions(Enum) is an enum that contains our collection of loss functions. You can add new loss functions as shown in the implementation.

Optimizers (optimizer.py)

Collection of optimizers that the user can access based on what they specify from the admin website

Utility Functions (utils.py)

This file contains getter functions and functions used to generate data visualizations

generate_confusion_matrix(labels_last_epoch, y_pred_last_epoch)

  • This function generates a confusion matrix based on labels and prediction results returned from the last epoch of training
  • Confusion matrix are only generated for classification problems
  • train_deep_classification_model() from dl_trainer.py calls this function
  • This function doesn't return anything, it will save the generated plot as png to a designated directory which will be used to display in the frontend and emailed to the user

generate_AUC_ROC_CURVE(labels_last_epoch, y_pred_last_epoch)

  • This function generates AUC/ROC curves based on labels and prediction results returned from the last epoch of training
  • AUC/ROC curves are only generated for classification problems
  • This works for multi-class classification as well, it uses a one-vs-all approach so we will have one curve for each class
  • train_deep_classification_model() from dl_trainer.py calls this function
  • This function returns raw data for the curves which will be passed to frontend to generate an interactive graph
  • The graph is also generated in the backend in addition to the graph generated in the frontend, this is to ensure that the graph can be emailed as png, which needs to be done in the backend