# NN Modules (MXNet)¶

We welcome your contribution! If you want a model to be implemented in DGL as a NN module, please create an issue started with “[Feature Request] NN Module XXXModel”.

If you want to contribute a NN module, please create a pull request started with “[NN] XXXModel in MXNet NN Modules” and our team member would review this PR.

## Conv Layers¶

MXNet modules for graph convolutions.

### GraphConv¶

class dgl.nn.mxnet.conv.GraphConv(in_feats, out_feats, norm=True, bias=True, activation=None)[source]

Bases: mxnet.gluon.block.Block

Apply graph convolution over an input signal.

Graph convolution is introduced in GCN and can be described as below:

$h_i^{(l+1)} = \sigma(b^{(l)} + \sum_{j\in\mathcal{N}(i)}\frac{1}{c_{ij}}h_j^{(l)}W^{(l)})$

where $$\mathcal{N}(i)$$ is the neighbor set of node $$i$$. $$c_{ij}$$ is equal to the product of the square root of node degrees: $$\sqrt{|\mathcal{N}(i)|}\sqrt{|\mathcal{N}(j)|}$$. $$\sigma$$ is an activation function.

The model parameters are initialized as in the original implementation where the weight $$W^{(l)}$$ is initialized using Glorot uniform initialization and the bias is initialized to be zero.

Notes

Zero in degree nodes could lead to invalid normalizer. A common practice to avoid this is to add a self-loop for each node in the graph, which can be achieved by:

>>> g = ... # some DGLGraph
Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. norm (bool, optional) – If True, the normalizer $$c_{ij}$$ is applied. Default: True. bias (bool, optional) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
weight

mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias

mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward(graph, feat)[source]

Compute graph convolution.

Notes

• Input shape: $$(N, *, \text{in_feats})$$ where * means any number of additional dimensions, $$N$$ is the number of nodes.
• Output shape: $$(N, *, \text{out_feats})$$ where all but the last dimension are the same shape as the input.
Parameters: graph (DGLGraph) – The graph. feat (mxnet.NDArray) – The input feature The output feature mxnet.NDArray

### RelGraphConv¶

class dgl.nn.mxnet.conv.RelGraphConv(in_feat, out_feat, num_rels, regularizer='basis', num_bases=None, bias=True, activation=None, self_loop=False, dropout=0.0)[source]

Bases: mxnet.gluon.block.Block

Relational graph convolution layer.

Relational graph convolution is introduced in “Modeling Relational Data with Graph Convolutional Networks” and can be described as below:

$h_i^{(l+1)} = \sigma(\sum_{r\in\mathcal{R}} \sum_{j\in\mathcal{N}^r(i)}\frac{1}{c_{i,r}}W_r^{(l)}h_j^{(l)}+W_0^{(l)}h_i^{(l)})$

where $$\mathcal{N}^r(i)$$ is the neighbor set of node $$i$$ w.r.t. relation $$r$$. $$c_{i,r}$$ is the normalizer equal to $$|\mathcal{N}^r(i)|$$. $$\sigma$$ is an activation function. $$W_0$$ is the self-loop weight.

The basis regularization decomposes $$W_r$$ by:

$W_r^{(l)} = \sum_{b=1}^B a_{rb}^{(l)}V_b^{(l)}$

where $$B$$ is the number of bases.

The block-diagonal-decomposition regularization decomposes $$W_r$$ into $$B$$ number of block diagonal matrices. We refer $$B$$ as the number of bases.

Parameters: in_feat (int) – Input feature size. out_feat (int) – Output feature size. num_rels (int) – Number of relations. regularizer (str) – Which weight regularizer to use “basis” or “bdd” num_bases (int, optional) – Number of bases. If is none, use number of relations. Default: None. bias (bool, optional) – True if bias is added. Default: True activation (callable, optional) – Activation function. Default: None self_loop (bool, optional) – True to include self loop message. Default: False dropout (float, optional) – Dropout rate. Default: 0.0
forward(g, x, etypes, norm=None)[source]

Forward computation

Parameters: g (DGLGraph) – The graph. x (mx.ndarray.NDArray) – Input node features. Could be either $$(|V|, D)$$ dense tensor $$(|V|,)$$ int64 vector, representing the categorical values of each node. We then treat the input feature as an one-hot encoding feature. etypes (mx.ndarray.NDArray) – Edge type tensor. Shape: $$(|E|,)$$ norm (mx.ndarray.NDArray) – Optional edge normalizer tensor. Shape: $$(|E|, 1)$$ New node features. mx.ndarray.NDArray

### TAGConv¶

class dgl.nn.mxnet.conv.TAGConv(in_feats, out_feats, k=2, bias=True, activation=None)[source]

Bases: mxnet.gluon.block.Block

Apply Topology Adaptive Graph Convolutional Network

$\mathbf{X}^{\prime} = \sum_{k=0}^K \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\mathbf{X} \mathbf{\Theta}_{k},$

where $$\mathbf{A}$$ denotes the adjacency matrix and $$D_{ii} = \sum_{j=0} A_{ij}$$ its diagonal degree matrix.

Parameters: in_feats (int) – Number of input features. out_feats (int) – Number of output features. k (int, optional) – Number of hops :math: k. (default: 2) bias (bool, optional) – If True, adds a learnable bias to the output. Default: True. activation (callable activation function/layer or None, optional) – If not None, applies an activation function to the updated node features. Default: None.
lin

mxnet.gluon.parameter.Parameter – The learnable weight tensor.

bias

mxnet.gluon.parameter.Parameter – The learnable bias tensor.

forward(graph, feat)[source]

Compute graph convolution

Parameters: graph (DGLGraph) – The graph. feat (mxnet.NDArray) – The input feature of shape $$(N, D_{in})$$ where $$D_{in}$$ is size of input feature, $$N$$ is the number of nodes. The output feature of shape $$(N, D_{out})$$ where $$D_{out}$$ is size of output feature. mxnet.NDArray

## Global Pooling Layers¶

MXNet modules for graph global pooling.

### SumPooling¶

class dgl.nn.mxnet.glob.SumPooling[source]

Bases: mxnet.gluon.block.Block

Apply sum pooling over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute sum pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(*)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, *)$$. mxnet.NDArray

### AvgPooling¶

class dgl.nn.mxnet.glob.AvgPooling[source]

Bases: mxnet.gluon.block.Block

Apply average pooling over the nodes in the graph.

$r^{(i)} = \frac{1}{N_i}\sum_{k=1}^{N_i} x^{(i)}_k$
forward(graph, feat)[source]

Compute average pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(*)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, *)$$. mxnet.NDArray

### MaxPooling¶

class dgl.nn.mxnet.glob.MaxPooling[source]

Bases: mxnet.gluon.block.Block

Apply max pooling over the nodes in the graph.

$r^{(i)} = \max_{k=1}^{N_i} \left( x^{(i)}_k \right)$
forward(graph, feat)[source]

Compute max pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, *)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(*)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, *)$$. mxnet.NDArray

### SortPooling¶

class dgl.nn.mxnet.glob.SortPooling(k)[source]

Bases: mxnet.gluon.block.Block

Apply Sort Pooling (An End-to-End Deep Learning Architecture for Graph Classification) over the nodes in the graph.

Parameters: k (int) – The number of nodes to hold for each graph.
forward(graph, feat)[source]

Compute sort pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(k * D)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, k * D)$$. mxnet.NDArray

### GlobalAttentionPooling¶

class dgl.nn.mxnet.glob.GlobalAttentionPooling(gate_nn, feat_nn=None)[source]

Bases: mxnet.gluon.block.Block

Apply Global Attention Pooling (Gated Graph Sequence Neural Networks) over the nodes in the graph.

$r^{(i)} = \sum_{k=1}^{N_i}\mathrm{softmax}\left(f_{gate} \left(x^{(i)}_k\right)\right) f_{feat}\left(x^{(i)}_k\right)$
Parameters: gate_nn (gluon.nn.Block) – A neural network that computes attention scores for each feature. feat_nn (gluon.nn.Block, optional) – A neural network applied to each feature before combining them with attention scores.
forward(graph, feat)[source]

Compute global attention pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(D)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, D)$$. mxnet.NDArray

### Set2Set¶

class dgl.nn.mxnet.glob.Set2Set(input_dim, n_iters, n_layers)[source]

Bases: mxnet.gluon.block.Block

Apply Set2Set (Order Matters: Sequence to sequence for sets) over the nodes in the graph.

For each individual graph in the batch, set2set computes

\begin{align}\begin{aligned}q_t &= \mathrm{LSTM} (q^*_{t-1})\\\alpha_{i,t} &= \mathrm{softmax}(x_i \cdot q_t)\\r_t &= \sum_{i=1}^N \alpha_{i,t} x_i\\q^*_t &= q_t \Vert r_t\end{aligned}\end{align}

for this graph.

Parameters: input_dim (int) – Size of each input sample n_iters (int) – Number of iterations. n_layers (int) – Number of recurrent layers.
forward(graph, feat)[source]

Compute set2set pooling.

Parameters: graph (DGLGraph or BatchedDGLGraph) – The graph. feat (mxnet.NDArray) – The input feature with shape $$(N, D)$$ where $$N$$ is the number of nodes in the graph. The output feature with shape $$(D)$$ (if input graph is a BatchedDGLGraph, the result shape would be $$(B, D)$$. mxnet.NDArray

## Utility Modules¶

### Edge Softmax¶

Gluon layer for graph related softmax.

dgl.nn.mxnet.softmax.edge_softmax(graph, logits, eids='__ALL__')[source]

Compute edge softmax.

For a node $$i$$, edge softmax is an operation of computing

$a_{ij} = \frac{\exp(z_{ij})}{\sum_{j\in\mathcal{N}(i)}\exp(z_{ij})}$

where $$z_{ij}$$ is a signal of edge $$j\rightarrow i$$, also called logits in the context of softmax. $$\mathcal{N}(i)$$ is the set of nodes that have an edge to $$i$$.

An example of using edge softmax is in Graph Attention Network where the attention weights are computed with such an edge softmax operation.

Parameters: graph (DGLGraph) – The graph to perform edge softmax logits (mxnet.NDArray) – The input edge feature eids (mxnet.NDArray or ALL, optional) – Edges on which to apply edge softmax. If ALL, apply edge softmax on all edges in the graph. Default: ALL. Softmax value Tensor

Notes

• Input shape: $$(E, *, 1)$$ where * means any number of additional dimensions, $$E$$ equals the length of eids. If eids is ALL, $$E$$ equals number of edges in the graph.
• Return shape: $$(E, *, 1)$$

Examples

>>> from dgl.nn.mxnet.softmax import edge_softmax
>>> import dgl
>>> from mxnet import nd

Create a DGLGraph object and initialize its edge features.

>>> g = dgl.DGLGraph()
>>> g.add_edges([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2])
>>> edata = nd.ones((6, 1))
>>> edata
[[1.]
[1.]
[1.]
[1.]
[1.]
[1.]]
<NDArray 6x1 @cpu(0)>

Apply edge softmax on g:

>>> edge_softmax(g, edata)
[[1.        ]
[0.5       ]
[0.33333334]
[0.5       ]
[0.33333334]
[0.33333334]]
<NDArray 6x1 @cpu(0)>

Apply edge softmax on first 4 edges of g: >>> edge_softmax(g, edata, nd.array([0,1,2,3], dtype=’int64’)) [[1. ]

[0.5] [1. ] [0.5]]

<NDArray 4x1 @cpu(0)>