Deep Learning Files (backend.dl) - DSGT-DLP/Deep-Learning-Playground GitHub Wiki

About

This page contains documentation for the files in the /backend/dl directory of the Github repo. This page is regularly updated when new changes are added to these files.

Evaluation Metrics (dl_eval.py)

This file contains functions to calculate evaluation metrics. Right now, we have support for computing accuracy for classification problems.

We call the evaluation functions within dl_eval.py from dl_trainer.py, but keep in mind that for a function like compute_accuracy(), you take in the actual vs. predicted values as torch.Tensor format. Note that if your data was somehow in the form of a numpy array, it is relatively straightforward to convert the numpy arrays to torch.Tensor.

DL Model Parser (dl_model_parser.py)

Endpoint that takes in the user-specified DL architecture (via drag and drop endpoint) and parses the "raw array of string" into actual torch.nn objects that get passed to /backend/dl/dl_model.py.

Example Usage

parse_deep_user_architecture(
    [
        "nn.Linear(in_features=50, out_features=10)",
        "nn.Conv2d(in_channels=3, out_channels=36, kernel_size=3, stride=2)",
    ]
) = [nn.Linear(in_features=50, out_features=10), nn.Conv2d(in_channels=3, out_channels=36, kernel_size=3, stride=2)]

Look carefully at the output. Do you see how the output is different from the input into parse_deep_user_architecture()

DL Model Architecture (dl_model.py)

This file contains the logic that takes in the user-specified DL architecture (via the drag and drop endpoint) and builds the PyTorch representation of the model in the form of nn.Sequential() list. One thing that would be cool to support (but will take some work) is to allow for users to create custom architectures (like Bottleneck in Resnet) and be able to drag and drop those and have this file "smartly" parse the architecture and build it

Deep Learning Trainer (dl_trainer.py)

This file is a very important file in our codebase. This file is what orchestrates the process of training deep learning models for classification and regression problems. This function provides a general implementation for training a deep learning model in Pytorch epoch by epoch. For each epoch, the following happens:

  • Start timer
  • For each batch in the train loader, do the following ** Set gradients to zero for optimization (gradient descent purposes) ** Make prediction on the input ** Evaluate loss criterion on prediction vs. actual output ** Backpropagation ** Update weights
  • Stop the timer
  • Update running list of epoch train time, running list of train loss, running list of test loss

The dl_trainer is capable of exporting a file called dl_results.csv, which is a table that has each epoch, train loss, test loss, time taken. We also store trained model weights + architecture in the form of ONNX and .pt files. ONNX files allow for a user to visualize model architecture on netron.app

Example Usage (bit pseudocode-ish)

model = nn.Sequential([nn.Linear(4, 10), nn.ReLU(), nn.Linear(10, 20), nn.ReLU(), nn.Linear(20, 3), nn.Softmax()])
train, test = train_test_split(X, y, test_size=0.2)
train_loader, test_loader = get_dataloaders(train, test)
optimizer = torch.optim.SGD()
criterion = torch.criterion.CELoss()
epochs = 10
problem_type = "CLASSIFICATION"
train_deep_model(model, train_loader, test_loader, optimizer, criterion, epochs, problem_type)

Pretrained Models (pretrained.py)

The file provides the functionality to use established model architectures (like resnet, vgg) as well as already obtained weights and biases to train on the user's image dataset. The trained metrics are updated at backend/dl_results.csv, and a .pt file is generated at frontend/playground-frontend/src/backend_outputs/my_deep_learning_model.pt

The image dataset needs to be in a zipped folder with the following structure -- root/train/class1/xxx.png root/train/class2/yyy.png root/valid/class1/123.png root/valid/class2/img.png ... Example zipped file for this is tests/zip_files/double_zipped.zip

A pretrained model is created by cutting an already estabilished architecture model (like resnet) into two or more parts. When the model is cut, it is divided into head and body. The parameters of body are kept the same from where the model is loaded, and the head is trained.

Endpoints

  • train(): trains, saves train_loss, valid_loss and outputs a .pth file
  • get_all(): returns all models supported

Example Usage

train(
        zipped_file="../tests/zip_files/double_zipped.zip",
        model_name="xcit_small_12_p8_224_dist",
        batch_size=2,
        loss_func=torch.nn.CrossEntropyLoss(),
        n_epochs=3,
        shuffle=False,
        optimizer=SGD,
        lr=3e-4,
        n_classes=2,
        train_transform = [torchvision.transforms.Resize((224, 224)), torchvision.transforms.ToTensor() ] ## Do NOT add torchvision.transforms.Compose(),
        cut = 2 # cut the model from the second layer. second layer will go the head (can be a list as well)
    ) 

Possible cases of errors:

  1. train_loss or valid_loss have nan values => batch size is greater than the size of dataset
  2. Some errors can be traced here
  3. ViT models are not fully compatible YET

if you face any other error please tag @vidushiMaheshwari to it.