Feature implementation - UCLA-BD2K/metaprot GitHub Wiki

Implementation details

DynamoDBClient

Overview:

DynamoDBClient (DDBC) provides methods to upload larger items to AWS DynamoDB. The current implementation affords uploading items of any size (theoretically unbounded), by chunking the item into 400KB chunks.

An average flow of events when using DDBC is as follows:

  • A call is made to upload a record to DynamoDB.
  • DDBC, which is given the file size and the content as a String, begins to iteratively break the data into a byte array to send upstream to AWS. The length of said array is never larger than 400KB, which is the official limit of DynamoDB.
  • The byte array effectively represents a String representation of the data (i.e. using jackson-* libraries).
  • There are 2 keys in play, a partition and sort key. The chunks are stored in a separate chunk table, with a composite primary key consisting of both the partition and sort key. The partition key is the token, and the sort key indicates what position the series of bytes belongs to in the original non-chunked item.
  • On finish, a status is returned to the caller which includes the total number of chunks uploaded to Dynamo.
  • One last upload is made to the main metaprot-task table, which effectively acts as a manifest record. This record contains the same token used in the composite key of the chunk, and details information such as: timestamp, number of chunks, and original filename uploaded.
  • When data is to be retrieved, DDBC exposes a method to retrieve the String content (and any additional chunks needed). The return of this function should match exactly the input in step 1. The logic behind leaving them as a String representation is so that any arbitrary data type can be uploaded and retrieved, as long as it can be reasonably marshalled/unmarshalled into some String representation by the caller.

Client-side file upload

Overview

There is a file uploader JavaScript module written and used originally for Copakb (however has no internal dependencies on that project). The file, S3Uploader.js, exposes the S3Uploader module with functions to upload files in chunks (with optional parallelism) directly to Amazon S3. Temporary user credentials retrieved via AWS Cognito give the users the right to upload to files to specific locations of the appropriate bucket. When a user uploads a file to MetProt, the user is given a unique session token that can later be used to retrieve and re-analyze the file(s).

Thus, flow is as follows:

  • User attempts to retrieve temporary credentials from Cognito.
  • User then load their file in memory using the FileController module (found in S3Uploader.js).
  • If the user does not yet have a session token (i.e. this is the first file being uploaded), then a UUID session token is retrieved from HTTP GET /analyze/token.
  • The user's file is then uploaded directly to AWS S3, in the S3 bucket associated with the MetProt user's UUID session token.

Differential Expression Analysis (DEA)

Overview:

DEA uses both REST and web controllers. REST controllers exist to begin analysis, whereas the web controllers exist to display HTML results of the analysis. As with Pattern recognition, a user is expected to have uploaded a file (and selected their desired levels of pre-processing) at an earlier step.

Here is a breakdown of the expected interaction with the DEA feature, as well as high level details of the business logic:

  • On the front end, a user will select a previously uploaded file and fill out a form that contains important threshold values, among others.
  • On form submit, the server will use an R script to attempt to transform the processed input file into one that is expected for metabolite analysis. The server will return (quickly) once the file is ready to continue. The server may return error messages if the token is invalid, or if transformation fails. The UI will be careful to show only error messages, and will silently move on to the next step on success.
  • Once the front end receives the OK from the server, another REST call is made to HTTP POST /analyze/metabolites/<token>, where <token> is a UUID task token retrieved from HTTP GET /analyze/token. This starts analysis.
  • The server will run the appropriate R commands via Rserve, which leads to a certain number of files being generated to a predefined directory.
  • The server will read in these files and store the results into the database.
  • The REST call above returns some HTML to display to the user, either an error or success message with a link to the results page.
  • The user navigates to the result page, and the web controller contacts the database for the computed results.
  • The results are passed back to the front end, using Thymeleaf as the template engine.
  • The JavaScript modules and libraries (D3.js) now do their work to initialize and bind the necessary events. Notable JS classes: DataSegregator.js for segregating the output from the server into significance groups, SVGPlot.js, handles all plotting and interactive events for the plot(s).

Temporal Pattern Recognition Analysis

Overview

Temporal Pattern Recognition Analysis also uses both REST and web controllers. REST controllers exist to begin analysis, whereas the web controllers exist to display HTML results of the analysis. It is assumed that a file was uploaded at an earlier step, and has undergone all pre-processing specified by the user.

Here is a breakdown of the expected interaction with the Temporal Pattern Recognition feature, as well as high level details of the business logic:

  • On the front end, a user will select a previously uploaded file.
  • On form submit, the server will attempt to transform the file into one that pattern recognition expects. Again, the server will return in a manner similar to that of DEA, and the UI will be careful to show only error messages. Success is treated silently and only then does logic continue to the next step.
  • Once the front end receives the OK from the server, a REST call is made to HTTP POST /analyze/pattern/<token>, where <token> is a UUID task token retrieved from HTTP GET /analyze/token. This starts analysis.
  • The server will run the appropriate R commands via Rserve, which leads to a certain number of files being generated to a predefined directory.
  • The server will read in these files and store the results into the database.
  • The REST call above returns some HTML to display to the user, either an error or success message with a link to the results page.
  • The user navigates to the result page, and the web controller contacts the database for the computed results.
  • The results are passed back to the front end, using Thymeleaf as the template engine.
  • The user is also given the option of changing the input parameters and recomputing the clusters.
  • The JavaScript modules and libraries (D3.js) now do their work to initialize and bind the necessary events. Notable JS classes: PatternRecognitionPlot.js, handles all plotting and interactive events for the plot(s).
⚠️ **GitHub.com Fallback** ⚠️