13. Bioinformatics - patent-analysis/OpenPSV-Documentation GitHub Wiki

The ability to  search patents is implemented in the frontend using Lens.org patent search tools. Lens.org aggregates and links open datasets consisting of scholarly works, patents and sequence listings in order to facilitate discovery and analysis. Lens provides an open platform accessible by all users, registered or not, and provides a myriad of open and free tools with private, secure access to diverse data sources.

Sequence Alignment Approach

Patents provide biological sequences of interest within the patent itself or as an additional sequence listing file. Two patents pertaining to the same antigen or antibody can and do provide sequence listings that might differ in the starting or ending locations or might possess variations. To reconcile these differences and visually compare the sequence listings, multiple sequence alignment is used to align the sequences against each other. Multiple sequence alignment is an NP-complete problem that increases in complexity exponentially as the number of sequences increases. Globally optimal solutions can be derived using dynamic programming methods, but such an approach becomes infeasible beyond a handful of sequences. A plethora of algorithmic techniques exist that attempt to provide an approximation to this globally optimal solution while improving on the exponential time complexity. Clustal Omega is a multiple sequence alignment tool that utilizes seeded guide trees and HMM profile-profile methods to provide good approximate solutions to this problem.

Clustal Omega is available as a binary for most major platforms and can be invoked using multiple input sequence formats. OpenPSV implements an AWS lambda function, invoked by a REST API call, that passes the sequences to be aligned in a FASTA format to Clustal Omega. Clustal Omega then returns the aligned strings in the same FASTA format. The lambda function then remaps the epitope locations to the new aligned string and returns the results to the caller.