GraphormerLayerο
- class dgl.nn.pytorch.gt.GraphormerLayer(feat_size, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())[source]ο
Bases:
Module
Graphormer Layer with Dense Multi-Head Attention, as introduced in Do Transformers Really Perform Bad for Graph Representation?
- Parameters:
feat_size (int) β Feature size.
hidden_size (int) β Hidden size of feedforward layers.
num_heads (int) β Number of attention heads, by which
feat_size
is divisible.attn_bias_type (str, optional) β
The type of attention bias used for modifying attention. Selected from βaddβ or βmulβ. Default: βaddβ.
βaddβ is for additive attention bias.
βmulβ is for multiplicative attention bias.
norm_first (bool, optional) β If True, it performs layer normalization before attention and feedforward operations. Otherwise, it applies layer normalization afterwards. Default: False.
dropout (float, optional) β Dropout probability. Default: 0.1.
attn_dropout (float, optional) β Attention dropout probability. Default: 0.1.
activation (callable activation layer, optional) β Activation function. Default: nn.ReLU().
Examples
>>> import torch as th >>> from dgl.nn import GraphormerLayer
>>> batch_size = 16 >>> num_nodes = 100 >>> feat_size = 512 >>> num_heads = 8 >>> nfeat = th.rand(batch_size, num_nodes, feat_size) >>> bias = th.rand(batch_size, num_nodes, num_nodes, num_heads) >>> net = GraphormerLayer( feat_size=feat_size, hidden_size=2048, num_heads=num_heads ) >>> out = net(nfeat, bias)
- forward(nfeat, attn_bias=None, attn_mask=None)[source]ο
Forward computation.
- Parameters:
nfeat (torch.Tensor) β A 3D input tensor. Shape: (batch_size, N,
feat_size
), where N is the maximum number of nodes.attn_bias (torch.Tensor, optional) β The attention bias used for attention modification. Shape: (batch_size, N, N,
num_heads
).attn_mask (torch.Tensor, optional) β The attention mask used for avoiding computation on invalid positions, where invalid positions are indicated by True values. Shape: (batch_size, N, N). Note: For rows corresponding to unexisting nodes, make sure at least one entry is set to False to prevent obtaining NaNs with softmax.
- Returns:
y β The output tensor. Shape: (batch_size, N,
feat_size
)- Return type:
torch.Tensor