Applications of Graph Neural Networks (GNNs) - 180D-FW-2024/Knowledge-Base-Wiki GitHub Wiki
Revised 180DA/DW Wiki Article Report
Applications of Graph Neural Networks (GNNs) in Academic Research & Silicon Valley Startups
Anthony Guerrera
[Overview]
Graph Neural Networks (GNNs) present an exciting new framework for predicting insightful information on multi-dimensional data. Many of the traditional speech recognition, text generation, and language & image processing architectures (traditional neural network architectures) that are widely used across industry and in research today perform remarkably well due to the encoding, serialization, & tokenization of euclidean data. What does this mean? Text, for instance, can be represented as a one-dimensional sequence of characters. This simplification of input data leads to straightforward ways to train and test. However, researchers have been recently thinking about many different types of problems having to do with data that doesn’t necessarily fit this criteria.
Non-Euclidean data sources (such as relationships between people, > than 2D hierarchical information, 3D objects, and overall data where relationships aren’t defined in a singular plane) are everywhere. Customer sales, user sessions & product use, and biological data are a few more examples. The trouble with training and predicting models in higher dimensions is due to the fact that as the number of dimensions of the input grows, the number of samples needed to approximate a model for the data grows exponentially (classically known as the curse of dimensionality). This means that it could take years to compute the result of a relatively simple task, which is why optimizing this process is a big area of interest for researchers today.
How are experienced researchers solving this problem? What are some new ideas being passed around right now? Where are young startups looking to make a difference? Could GNN architectures lead to profit? How are mathematicians exploiting symmetry and structure of information to make better decisions? This article serves to provide a broad overview regarding these types of questions.
[What’s new in Silicon Valley]
Jure Leskovec is a professor at Stanford University and is the co-founder of Kumo.ai. Kumo.ai is a culmination of years of Leskovec’s brilliant research in GNN architecture and graph signal processing. Leskovec and his team built out a platform that automates processing and training on relational data. Kumo’s product will build a customized GNN architecture based on the inherent structure of the data (there is no one-size-fits-all approach for GNNs due to their inherent complexity) which will encode feature subgraphs of ideal feature vectors and will iteratively test and generate different combinations of models based on the performance of each one. At a high level, this means that the software guesses a good way to train the complex, high-dimensional data and will adjust its guess based on the performance of its initial guess. In the end, Kumo will help the third-party deploy the specific model architecture that it generates. For more information on how Kumo is implemented, see here. The global data management software market is set to reach about $138B by 2026 (Fortune), so it wouldn’t be surprising if companies such Kumo.ai make it big in the next few years. Companies such Spotify, NVIDIA, & Snowflake are all currently using Kumo.ai. Kumo.ai is Series B with two rounds led by Sequoia.
Fiddler AI is an observability platform that monitors the usage of enterprise systems. This essentially means that companies and the US government use Fiddler to identify if their LLM is working as intended. This unique quality assurance platform is in high-demand: companies such as Datadog, NVIDIA, Google Cloud, & MetaAI all require highly-specific monitoring platforms for their cutting-edge research. Soon, monitoring services such as Fiddler AI will begin implementing new customized frameworks for research teams that are implementing new ideas in the GNN space.
[Biomedical applications]
New methods of decoding fMRIs include those presented by GNNs and graph signal processing. Details regarding common graph transforms can be found in this IEEE article. Brain activities are commonly associated with complex connectivity networks, and neuroscience research today seeks to uncover the mysteries behind neurological disorders, human-machine interfaces, and overall cognitive functions. There are many unique ways in which graph filters are applied to match the complex structure of the human brain in order to make inferences on further activity based on past activity and processes such as blood-oxygen-level-dependent (BOLD) contrast throughout the brain. This process can uniquely identify regions that are used more often by people with ADHD (to name one specific disorder), and remarkably discover new patterns across regions of the brain that are associated with diseases that haven’t been formally discovered yet. This technology might be useful for the future of mental health diagnosis.
Additionally, the unique statistical inference properties of GNNs have shown to be useful for classifying MicroRNA interactions, and have been shown to correctly identify patients with multiple sclerosis based on a very limited set of high-dimensional data points. This is promising for further disease identification at earlier ages. Given that this technology can be readily available for younger people, identifying diseases that may not fully come into effect later in a person’s life will be extremely useful. Moreover, diagnostic biomarkers for patient populations with neurological diseases in general can be naturally analyzed via the tools of GNNs.
[Research]
Antibiotic discovery, relational databases, & general recommendation systems are at the forefront of GNN research and usage.
Companies such as Rockset (recently acquired by OpenAI), Snowflake, Databricks, all have extremely developed geometric based approaches to querying and organizing data. Now, all of this relational geometric data can be optimized (theoretically generated in real time) further. Querying large scale databases and collections of data will become more informative (and quicker) than ever before in the coming years. Not only will researchers be able to create strong and useful models, but questions about data can be answered almost instantly. This is incredibly remarkable and will transform the future of how we store and access information.
Due to the rapid emergence of antibiotic-resistant bacteria, there is a newfound need to consistently find and discover new antibiotics. In this article, the methods in which new antibiotics are discovered are revolutionary. A series of researchers trained a specific GNN to predict certain molecules that might aid in preventing a specific input bacteria. The abstract structure of the GNN proposes unconventional molecules that might be useful in reducing the impact of invasive bacteria. Since many of these proposed molecules are unique and never before synthesized, extensive testing of these new antibiotics must be thoroughly tested before being approved for diagnosis and usage. Nevertheless, the research is incredibly interesting and remarkable.
[Into the math: Compare & Contrast]
Why GNNs over other well-performing high-dimensional architectures like Convolutional Neural Networks (CNNs)? In one word: flexibility. CNNs are fantastic for reducing the dimensions of highly-structured 2D and 3D information, but fail to handle less structured relational data like GNNs can. CNNs employ convolutional layers that are useful for data with highly defined spatial structure. CNNs aren’t able to capture long-term dependencies in graphs, as all of the relationships are extremely local, and for a more general array of data, GNNs might be more useful.
Recurrent Neural Networks (RNNs) have classically been used to store memory across different layers in a neural network. However in general, RNNs perform much better on data that has a predefined order (e.g. text generation), while GNNs in general are useful for the case when there is no predefined way to order data. Additionally, many different articles on arxiv.org are exploring new ways in which GNNs can parallelize inference processes. The flexible and customizable nature of the architecture makes it so this is possible. Since RNNs act on sequential data, it would be very difficult to force parallelization of (e.g. token generation) of certain aspects of training and testing.
When the graph structure is inherently sparse (close to the minimum number of edges in a graph), GNNs perform particularly well, since there are only a few possible relationships and features that need to be captured. However, when the inherent graph structure of the data is dense (close to the maximum number of edges in a graph), it is difficult to enumerate and sample all of these relationships, worsening the practical time complexity of the GNN.
GNNs make use of inherent symmetries (commonly referred to as invariances under function transformations) of the underlying graph structure of the data to improve training and testing. The goal is to process the graph independent of the given node ordering – isomorphisms of the graph structure should be treated the same way under the model & architecture. Applying permutations of the relationships between the data shouldn’t affect the output in any way, even if these permutations are applied before or after the model’s functions.
[GNN Architectures & Algorithms]
The general framework for Graph Attention Networks (GATs) is shown above. Rating/Ranking & Prediction algorithms are performed as follows. GATs evaluate a specific attention ranking for each node present in the graph.
Many times, increasing the attention mechanisms in GNNs just lead to increasing the complexity of the model unnecessarily. Due to the unbalanced and heavy weighted attention to neighbor nodes, the generated embeddings for a particular node in the graph may be much different than the nodes originally generated embeddings. Due to this, the idea of neighbor selection in these relational GNN architectures is absolutely essential. The above is a “User-Item” bipartite graph. This is used to balance the weights in between nodes in an efficient and fair manner, so some nodes are denoted as user nodes and the other nodes present in the graph are considered to be item nodes. To weight the user-item-user path, we ensure that items that are used more often by users are weighted respectively. This corresponds to distributing attention between nodes more evenly.
A rating system is then constructed for user interactions with items. Aggregators and updaters are typically used for updating token embeddings in this type of architecture.
Above seen is the scoring function that is commonly used for weighting the item and user nodes during training.
[Additional Limitations & Drawbacks]
Major drawbacks and limitations of common GNN architectures as described above are commonly prescribed as over complicating a specific data. Throwing an inherently complex model at data that can be restructured to better perform amongst other models most of the time isn’t such a great idea. At times, it is also quite difficult to parallelize processing, due to many architectural components requiring synchronous token generation. It isn’t common for there to be “look ahead” in GNNs as opposed to the flexibility of this feature from many other different typical neural network architectures. Oftentimes, it is also too computationally expensive to calculate fundamental graph characteristics such as the longest cycle that is present in the graph as well as diameter. Many decisions are limited in that the only information being used is within local node embeddings. Additionally, GNNs perform poorly on small datasets and of course struggle majorly with noisy and loosely structured data. Additionally, the energy costs of storing these relationships in databases could be extremely cost consuming.
Therefore, for the majority of tasks, it is usually more practical to make use of a more simple architecture that might perform well in a certain metric that is being attempted to be predicted through the neural network, as opposed to trying to fit data into a unnecessarily complex structure such as a Graph Neural Network.
[Moving forward]
Overall, research in the area of predicting outcomes via graph structures is highly experimental and is difficult to implement successfully on many different classes of unstructured data. Further research done by teams at Google DeepMind, MetaAI, prestigious universities, etc. will soon transform the way in which inferences are made in graph structures. Optimization is the core of what many researchers are currently doing, and before we know it, we might be able to solve problems that we previously thought weren’t possible. The possibilities are endless, and it is exciting to see what the future in this space might hold.
[References & related articles]
https://ieeexplore.ieee.org/document/9585532
https://www.fortunebusinessinsights.com/enterprise-data-management-market-107010#
https://dataroots.io/blog/a-gentle-introduction-to-geometric
https://arxiv.org/abs/2104.13478
https://geometricdeeplearning.com/
Kumo AIKumo.aihttps://kumo.ai/
https://www.assemblyai.com/blog/ai-trends-graph-neural-networks/
https://www.sciencedirect.com/science/article/pii/S0092867420301021?ref=assemblyai.com