Performance Benchmarks

Integrated Benchmarks

DGL continuously evaluates the speed of its core APIs, kernels as well as the training speed of the state-of-the-art GNN models. The benchmark code is available at the main repository. They are triggered for every nightly-built version and the results are published to https://asv.dgl.ai/.

v0.6 Benchmarks

To understand the performance gain of DGL v0.6, we re-evaluated it on the v0.5 benchmarks plus some new ones for graph classification tasks against the updated baselines. The results are available in a standalone repository.

v0.5 Benchmarks

Check out our paper Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks.

v0.4.3 Benchmarks

Microbenchmark on speed and memory usage: While leaving tensor and autograd functions to backend frameworks (e.g. PyTorch, MXNet, and TensorFlow), DGL aggressively optimizes storage and computation with its own kernels. Here’s a comparison to another popular package – PyTorch Geometric (PyG). The short story is that raw speed is similar, but DGL has much better memory management.

Dataset	Model	Accuracy	Time		Memory
Dataset	Model	Accuracy	PyG	DGL	PyG	DGL
Cora	GCN	81.31 ± 0.88	0.478	0.666	1.1	1.1
Cora	GAT	83.98 ± 0.52	1.608	1.399	1.2	1.1
CiteSeer	GCN	70.98 ± 0.68	0.490	0.674	1.1	1.1
CiteSeer	GAT	69.96 ± 0.53	1.606	1.399	1.3	1.1
PubMed	GCN	79.00 ± 0.41	0.491	0.690	1.1	1.1
PubMed	GAT	77.65 ± 0.32	1.946	1.393	1.6	1.1
Reddit	GCN	93.46 ± 0.06	OOM	28.6	OOM	11.7
Reddit-S	GCN	N/A	29.12	9.44	15.7	3.6

Table: Training time(in seconds) for 200 epochs and memory consumption(GB)

Here is another comparison of DGL on TensorFlow backend with other TF-based GNN tools (training time in seconds for one epoch):

Dateset	Model	DGL	GraphNet	tf_geometric
Core	GCN	0.0148	0.0152	0.0192
Reddit	GCN	0.1095	OOM	OOM
PubMed	GCN	0.0156	0.0553	0.0185
PPI	GCN	0.09	0.16	0.21
Cora	GAT	0.0442	n/a	0.058
PPI	GAT	0.398	n/a	0.752

High memory utilization allows DGL to push the limit of single-GPU performance, as seen in below images.

http://data.dgl.ai/asset/image/DGLvsPyG-time1.png

http://data.dgl.ai/asset/image/DGLvsPyG-time2.png

Scalability: DGL has fully leveraged multiple GPUs in both one machine and clusters for increasing training speed, and has better performance than alternatives, as seen in below images.

http://data.dgl.ai/asset/image/one-four-GPUs.png

http://data.dgl.ai/asset/image/one-four-GPUs-DGLvsGraphVite.png

http://data.dgl.ai/asset/image/one-fourMachines.png

Further reading: Detailed comparison of DGL and other alternatives can be found [here](https://arxiv.org/abs/1909.01315).