Dataset¶

Utils¶

`utils.get_download_dir`()	Get the absolute path to the download directory.
`utils.download`(url[, path, overwrite, …])	Download a given URL.
`utils.check_sha1`(filename, sha1_hash)	Check whether the sha1 hash of the file content matches the expected hash.
`utils.extract_archive`(file, target_dir)	Extract archive file.

Dataset Classes¶

Stanford sentiment treebank dataset¶

For more information about the dataset, see Sentiment Analysis.

class dgl.data.SST(mode='train', vocab_file=None)[source]¶

Stanford Sentiment Treebank dataset.

Each sample is the constituency tree of a sentence. The leaf nodes represent words. The word is a int value stored in the x feature field. The non-leaf node has a special value PAD_WORD in the x field. Each node also has a sentiment annotation: 5 classes (very negative, negative, neutral, positive and very positive). The sentiment label is a int value stored in the y feature field.

Note

This dataset class is compatible with pytorch’s Dataset class.

Note

All the samples will be loaded and preprocessed in the memory first.

Parameters:	mode (str, optional) – Can be `'train'`, `'val'`, `'test'` and specifies which data file to use. vocab_file (str, optional) – Optional vocabulary file.

__getitem__(idx)[source]¶

Get the tree with index idx.

Parameters:	idx (int) – Tree index.
Returns:	Tree.
Return type:	dgl.DGLGraph

__len__()[source]¶

Get the number of trees in the dataset.

Returns:	Number of trees.
Return type:	int

Mini graph classification dataset¶

class dgl.data.MiniGCDataset(num_graphs, min_num_v, max_num_v)[source]¶

The dataset class.

The datset contains 8 different types of graphs.

class 0 : cycle graph
class 1 : star graph
class 2 : wheel graph
class 3 : lollipop graph
class 4 : hypercube graph
class 5 : grid graph
class 6 : clique graph
class 7 : circular ladder graph

Note

This dataset class is compatible with pytorch’s Dataset class.

Parameters:	num_graphs (int) – Number of graphs in this dataset. min_num_v (int) – Minimum number of nodes for graphs max_num_v (int) – Maximum number of nodes for graphs

__getitem__(idx)[source]¶

Get the i^th sample.

idx : int: The sample index.

Returns:	The graph and its label.
Return type:	(dgl.DGLGraph, int)

__len__()[source]¶: Return the number of graphs in the dataset.

num_classes¶: Number of classes.

Protein-Protein Interaction dataset¶

class dgl.data.PPIDataset(mode)[source]¶

A toy Protein-Protein Interaction network dataset.

Adapted from https://github.com/williamleif/GraphSAGE/tree/master/example_data.

The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels.

We use 20 graphs for training, 2 for validation and 2 for testing.

__getitem__(item)[source]¶

Get the i^th sample.

idx : int: The sample index.

Returns:	The graph, features and its label.
Return type:	(dgl.DGLGraph, ndarray, ndarray)

__len__()[source]¶: Return number of samples in this dataset.