Matroid lifting (Graph to Combinatorial) - geometric-intelligence/TopoBench GitHub Wiki

Update 7/18: Added highlights (after overview)

In this PR, I talk about a way to convert graphs to matroids, which are (almost) combinatorial complexes.

See the accompanying notebook for better details.

Overview

For a given $G = (V, E)$

Compute the graphic matroid $M(G) = (E, \mathcal{I})$, by computing the spanning trees of $G$.
Compute the dual matroid $M^*(G)$, by taking the complement of spanning trees wrt to $E$.
Compute the graph curve matroid $M_{g} = (V, \mathcal{I}_{g})$ using $\mathit{rank}$ of $M^*(G)$, where the circuits are calculated based on edge incidence.
Convert the matroid to a graph curve matroid using a simple transformation.

Important note: Matroids are general structures, and we can always perform a trivial lifting by transforming a matroid to a CC. For example, we can use the set of spanning trees to be the k-cells of a complex, where $k = |V| - 1 - 1$. However, we can go further and use graph curve matroids, where graph curves are connected, projective algebraic curves which correspond to the connectivity of the dual graphs of the original graph.

Note: the image is missing one blue edge, but the result is the same.

To accommodate with tests, smaller graphs were used. Here is the commit with the notebook that has slightly bigger graphs.

New Datasets

The datasets used were taken from houseofgraphs, which have different datasets based on connectivity, graph genus, etc. For simplicity, they are manually coded in, we can consider using the site as a whole for datasets.

Mathematical Foundation

Covered more below, but these are inspired by abstract notions of linear independence formulated by Whitney, called Matroids. In particular, we also use graph curve matroids, formulated by Geiger et al. They represent the hyperplane sections of the graph curve from the dual graph of graphs.

Original Approaches

To my current knowledge, applying matroids as combinatorial complexes has not been done yet. I also did very minor modifications to the definitions of matroids to suit their compatibility with the definition of combinatorial complexes.

Non deterministic Procedures

While not implemented, there does exist a nondeterministic procedure by way of sampling via markov chains. See [4] for more details.

What is a Matroid?

A Matroid is a very familiar geometric structure that encompasses several combinatorial structures, and I found the opportunity to apply them here! There are a wide variety amount of definitions, but I will focus on 2 of them: We say call a set system $(S, \mathcal{I})$, a matroid $\mathcal{M} = (S, \mathcal{I})$, where we understand $S$ to be some ground set and if $I \in \mathcal{I} \subseteq 2^S$, then $I$ is considered independent. Moreover, $\mathcal{I}$ follows these 3 properties, for if $A, B \subseteq S$:

\begin{align*}
    \emptyset \in \mathcal{I} \\
    A \subseteq B \land B \in \mathcal{I} \rightarrow A \in \mathcal{I} \\
    \text{if} A, B \in \mathcal{I}, |A| < |B| \rightarrow \exists b \in B - A, A + b \in \mathcal{I}
\end{align*}

Note: I will use A + b to mean the union of the set A and the singleton set B' = {b}.

In particular, the first and second conditions tell us that a matroid is a (abstract) simplicial complex. The 3rd condition is more special, and I will introduce the 2nd equivalent definition for a Matroid.

Let $\mathit{rank}: S \to Z_+$ be what we will call the matroid rank function. It is one if it follows these properties, for $A \subseteq S, a \in S$:

\begin{align*}
    \mathit{rank}(\emptyset) = 0 \\
    \mathit{rank}(A) \leq |A| \\
    a \not\in A \rightarrow \mathit{rank}(S + a) = \mathit{rank}(S) + 1 \\
    \mathit{rank}(A \cup B) + \mathit{rank}(A \cap B) \leq \mathit{rank}(A) + \mathit{rank}(B)
\end{align*}

The above is equivalent, because we can say that $I \in \mathcal{I}$ if $\mathit{rank}(A) = |A|$.

From what we can see then is that a Matroid is a combinatorial complex. Albeit with some minor modifications, like the empty set isn't allowed to be in, and the rank is off by 1 for a normal combinatorial complex.

Matroids are combinatorial complexes

So, why would we ever want to use such an object instead of the more general combinatorial complex? It is my opinion that the most important in regard to machine learning is the 3rd property, which is more well known as the submodular property for functions. Alternative definitions include:

Diminishing Returns
Four Points
Group Diminishing Returns
See [1] for more details.

Submodular (monotone) functions, like matroid functions, give characteristics of information diversity on sets, where higher diversity is weighted more. [1] There are many problems in machine learning that can be described in terms of submodular minimization/maximization.

Perspectives

We discuss other reasons to use Matroids (from other perspectives):

Graph Theory: contraction, deletion are graph theoretic operations that describe new graphs. Dual graphs sometimes be made, but with matroids, unique duals of a matroid always exist. We focus on this perspective in this PR.
Liftings: From a given graph, there are multiple ways to make a matroid that represent the graph. Matroids can be defined differently from bipartite graphs, vertex incidence, colorings, etc. -- In particular, I would like to point that we can form a partition matroid from hypergraphs to CCs, where its bases can be the induced graphs from a hypergraph.
Linear Matroids: these are matroids based on linear independence of vectors.
Oriented Matroids: these are matroids based on hyperplane arrangements.
Optimization: Computing the bases of matroids can be done through greedy algorithms like the classic Kruskal Algorithm. There have been applications of using such greedy constructions of matroids through an intermediate point cloud --> simplicial complex. [3]

Other

In this PR, the method described to create a matroid from a given $G$ is the graph curve matroid construction. [2] The tutorial notebook gives a description of this matroid. I would like to note that the implementation is very slow because it relies on computing the spanning trees of a graph. In practice however, there are new results that show we can efficiently sample them (via MCMC). [4]

The difference between matroids and CCs are that matroids include empty sets and singletons have rank 0 instead of 1. To compensate, we can easily remove the empty set of a matroid and subtract its rank by 1.

I also use the HMC model, but however, it uses only up to rank=2 part of the complex, which corresponds to specific triangles that are detailed in the tutorial notebook. In practice, matroid rank functions are much more diverse than that.

[1] Bilmes, J. (2022). Submodularity in machine learning and artificial intelligence. arXiv. https://arxiv.org/abs/2202.00132 [2] Geiger, Alheydis, Kevin Kuehn, and Raluca Vlad. "Graph Curve Matroids." arXiv preprint arXiv:2311.08332 (2023). https://arxiv.org/abs/2311.08332. [3] Sun, T., & Nelson, B. (2023). Greedy Matroid Algorithm And Computational Persistent Homology. arXiv preprint arXiv:2308.01796. Retrieved from https://arxiv.org/abs/2308.01796 [4] Anari, N., Liu, K., Oveis Gharan, S., & Vinzant, C. (2019). Log-Concave Polynomials II: High-Dimensional Walks and an FPRAS for Counting Bases of a Matroid. arXiv preprint arXiv:1811.01816. Retrieved from https://arxiv.org/abs/1811.01816

From https://github.com/pyt-team/challenge-icml-2024/pull/32