Dataset¶
Utils¶
utils.get_download_dir () |
Get the absolute path to the download directory. |
utils.download (url[, path, overwrite, …]) |
Download a given URL. |
utils.check_sha1 (filename, sha1_hash) |
Check whether the sha1 hash of the file content matches the expected hash. |
utils.extract_archive (file, target_dir) |
Extract archive file. |
Dataset Classes¶
Stanford sentiment treebank dataset¶
For more information about the dataset, see Sentiment Analysis.
-
class
dgl.data.
SST
(mode='train', vocab_file=None)[source]¶ Stanford Sentiment Treebank dataset.
Each sample is the constituency tree of a sentence. The leaf nodes represent words. The word is a int value stored in the
x
feature field. The non-leaf node has a special valuePAD_WORD
in thex
field. Each node also has a sentiment annotation: 5 classes (very negative, negative, neutral, positive and very positive). The sentiment label is a int value stored in they
feature field.Note
This dataset class is compatible with pytorch’s
Dataset
class.Note
All the samples will be loaded and preprocessed in the memory first.
Parameters: -
__getitem__
(idx)[source]¶ Get the tree with index idx.
Parameters: idx (int) – Tree index. Returns: Tree. Return type: dgl.DGLGraph
-
Mini graph classification dataset¶
-
class
dgl.data.
MiniGCDataset
(num_graphs, min_num_v, max_num_v)[source]¶ The dataset class.
The datset contains 8 different types of graphs.
- class 0 : cycle graph
- class 1 : star graph
- class 2 : wheel graph
- class 3 : lollipop graph
- class 4 : hypercube graph
- class 5 : grid graph
- class 6 : clique graph
- class 7 : circular ladder graph
Note
This dataset class is compatible with pytorch’s
Dataset
class.Parameters: -
__getitem__
(idx)[source]¶ Get the i^th sample.
- idx : int
- The sample index.
Returns: The graph and its label. Return type: (dgl.DGLGraph, int)
-
num_classes
¶ Number of classes.
Protein-Protein Interaction dataset¶
-
class
dgl.data.
PPIDataset
(mode)[source]¶ A toy Protein-Protein Interaction network dataset.
Adapted from https://github.com/williamleif/GraphSAGE/tree/master/example_data.
The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels.
We use 20 graphs for training, 2 for validation and 2 for testing.
-
__getitem__
(item)[source]¶ Get the i^th sample.
- idx : int
- The sample index.
Returns: The graph, features and its label. Return type: (dgl.DGLGraph, ndarray, ndarray)
-