A no-brainer primer on graph neural networks and graph convolution to get a quick sense about their workings
December 4, 2020 • Pratulya Bubna
This post is intended to provide a very high-level intuition about the workings of Graph Convolutional Networks (GCNs)in the spatial domain. The goal is to provide an overall rough-idea about GNNs and GCNs without delving into the meaty math.
Images, text and audio data can easily be represented as a grid — a regular and structured format. Given this representation, Convolutional Neural Networks (CNNs) can be successfully applied to process this data.
However, non-Euclidean data such as social networks, 3D meshes, molecular data, knowledge graphs — inherently graphs, cannot be represented in a grid-like structure and instead lie in an irregular domain. Consequently, the convolution operation in the Euclidean case (CNNs) is not well-defined on non-Euclidean domains.
The learning architecture designed to handle unstructured, non-Euclidean graph data is Graph Neural Networks (GNNs). GNNs do not have strict structural requirements as opposed to regular neural networks that operate on fixed-dimension inputs (eg. CNNs built over MNIST dataset require all input images to be of size 28x28). This means that the number of vertices and edges between input graphs can change.
Variants of GNNs are Graph Convolutional Networks (GCNs) that have evolved due to works in generalizing convolutions to the graph domain.
A graph is a simple way of encoding pairwise relationships among a set of objects.
A graph \(G\) consists of a pair of sets \((V, E)\) — a collection \(V\) of
nodes
and a collection of \(E\) ofedges
. Each edge \(e \in E\) joins two nodes and thus is a two-element subset of \(V: e = \{u, v\}\) for some \(u, v \in V\).
CNNs learn features hierarchically by building from simpler ones to more complex. Desirable properties of CNNs include weight sharing, local connectivity, non-linearity and pooling layers, that have helped them achieve outstanding performance in a variety of tasks.
In graphs, the notion of neighborhoods and connectivity is ill-defined, making the application of convolution and pooling operations non-trivial. Two approaches to tackle this: spectral and non-spectral. Whereas spectral approaches deal with spectral representations of graphs (eg. graph Laplacian), non-spectral approaches define convolutions directly on the graph.
That is, the spatial approaches assume that graphs are embedded in a Euclidean space where nodes can have coordinates and weights on them. This assumption helps in generalizing pixels (in case of CNNs) to nodes of the graph, over which we can take the similar sliding window approach to aggregate the features of the neighborhoods.
We’ll stick to GCNs in spatial domain, in which, convolution is usually implemented through non-linear functions (eg. MLP) over spatial neighbors, and pooling may be adopted to produce a new coarsened graph by aggregating information from each point’s neighbors.
We can choose to determine a fix-sized (node degree) or a variable neighborhood, which would require to define an operator that can work with different sized neighborhoods. After determining the neighborhood, an aggregator is performed over it, for instance, mean
over all the neighbor’s feature vectors.
The aggregation operator generally is chosen to be permutation invariant. That is, the aggregated output is invariant to the order of neighbors picked. The usual choices being sum
, mean
and max
.
The equation encapsulates message passing and neighborhood aggregation in GCNs.
Calculating features of node \(\mathbf{x}_i\) in \(k^{th}\) layer is a non-linear function of: