Note
Click here to download the full example code
DGL Basics¶
Author: Minjie Wang, Quan Gan, Yu Gai, Zheng Zhang
The Goal of this tutorial:
- To create a graph.
- To read and write node and edge representations.
Graph Creation¶
The design of DGLGraph
was influenced by other graph libraries. Indeed,
you can create a graph from networkx, and convert it into a DGLGraph
and
vice versa:
import networkx as nx
import dgl
g_nx = nx.petersen_graph()
g_dgl = dgl.DGLGraph(g_nx)
import matplotlib.pyplot as plt
plt.subplot(121)
nx.draw(g_nx, with_labels=True)
plt.subplot(122)
nx.draw(g_dgl.to_networkx(), with_labels=True)
plt.show()
They are the same graph, except that DGLGraph
is always directional.
One can also create a graph by calling DGL’s own interface.
Now let’s build a star graph. DGLGraph
nodes are consecutive range of
integers between 0 and number_of_nodes()
and can grow by calling add_nodes
.
DGLGraph
edges are in order of their additions. Note that
edges are accessed in much the same way as nodes, with one extra feature
of edge broadcasting:
import dgl
import torch as th
g = dgl.DGLGraph()
g.add_nodes(10)
# a couple edges one-by-one
for i in range(1, 4):
g.add_edge(i, 0)
# a few more with a paired list
src = list(range(5, 8)); dst = [0]*3
g.add_edges(src, dst)
# finish with a pair of tensors
src = th.tensor([8, 9]); dst = th.tensor([0, 0])
g.add_edges(src, dst)
# edge broadcasting will do star graph in one go!
g.clear(); g.add_nodes(10)
src = th.tensor(list(range(1, 10)));
g.add_edges(src, 0)
import networkx as nx
import matplotlib.pyplot as plt
nx.draw(g.to_networkx(), with_labels=True)
plt.show()
Feature Assignment¶
One can also assign features to nodes and edges of a DGLGraph
. The
features are represented as dictionary of names (strings) and tensors,
called fields.
The following code snippet assigns each node a vector (len=3).
Note
DGL aims to be framework-agnostic, and currently it supports PyTorch and MXNet tensors. From now on, we use PyTorch as an example.
import dgl
import torch as th
x = th.randn(10, 3)
g.ndata['x'] = x
ndata
is a syntax sugar to access states of all nodes,
states are stored
in a container data
that hosts user defined dictionary.
print(g.ndata['x'] == g.nodes[:].data['x'])
# access node set with integer, list, or integer tensor
g.nodes[0].data['x'] = th.zeros(1, 3)
g.nodes[[0, 1, 2]].data['x'] = th.zeros(3, 3)
g.nodes[th.tensor([0, 1, 2])].data['x'] = th.zeros(3, 3)
Out:
tensor([[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]], dtype=torch.uint8)
Assigning edge features is in a similar fashion to that of node features, except that one can also do it by specifying endpoints of the edges.
g.edata['w'] = th.randn(9, 2)
# access edge set with IDs in integer, list, or integer tensor
g.edges[1].data['w'] = th.randn(1, 2)
g.edges[[0, 1, 2]].data['w'] = th.zeros(3, 2)
g.edges[th.tensor([0, 1, 2])].data['w'] = th.zeros(3, 2)
# one can also access the edges by giving endpoints
g.edges[1, 0].data['w'] = th.ones(1, 2) # edge 1 -> 0
g.edges[[1, 2, 3], [0, 0, 0]].data['w'] = th.ones(3, 2) # edges [1, 2, 3] -> 0
After assignments, each node/edge field will be associated with a scheme containing the shape and data type (dtype) of its field value.
print(g.node_attr_schemes())
g.ndata['x'] = th.zeros((10, 4))
print(g.node_attr_schemes())
Out:
{'x': Scheme(shape=(3,), dtype=torch.float32)}
{'x': Scheme(shape=(4,), dtype=torch.float32)}
One can also remove node/edge states from the graph. This is particularly useful to save memory during inference.
g.ndata.pop('x')
g.edata.pop('w')
Multigraphs¶
Many graph applications need multi-edges. To enable this, construct DGLGraph
with multigraph=True
.
g_multi = dgl.DGLGraph(multigraph=True)
g_multi.add_nodes(10)
g_multi.ndata['x'] = th.randn(10, 2)
g_multi.add_edges(list(range(1, 10)), 0)
g_multi.add_edge(1, 0) # two edges on 1->0
g_multi.edata['w'] = th.randn(10, 2)
g_multi.edges[1].data['w'] = th.zeros(1, 2)
print(g_multi.edges())
Out:
(tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 1]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
An edge in multi-graph cannot be uniquely identified using its incident nodes
\(u\) and \(v\); query their edge ids use edge_id
interface.
eid_10 = g_multi.edge_id(1, 0)
g_multi.edges[eid_10].data['w'] = th.ones(len(eid_10), 2)
print(g_multi.edata['w'])
Out:
tensor([[ 1.0000, 1.0000],
[ 0.0000, 0.0000],
[ 0.9346, 0.4135],
[-0.1230, -1.1479],
[-1.6210, 0.9632],
[-0.0015, -0.4406],
[ 0.3945, -0.0974],
[-0.2937, 1.3847],
[-0.2330, -1.0145],
[ 1.0000, 1.0000]])
Note
- Nodes and edges can be added but not removed; we will support removal in the future.
- Updating a feature of different schemes raise error on indivdual node (or node subset).
Next steps¶
In the next tutorial, we will go through the DGL message passing interface by implementing PageRank.
Total running time of the script: ( 0 minutes 0.100 seconds)