CornellDatasetο
- class dgl.data.CornellDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]ο
Bases:
GeomGCNDataset
Cornell subset of WebKB, later modified by Geom-GCN: Geometric Graph Convolutional Networks
Nodes represent web pages. Edges represent hyperlinks between them. Node features are the bag-of-words representation of web pages. The web pages are manually classified into the five categories, student, project, course, staff, and faculty.
Statistics:
Nodes: 183
Edges: 298
Number of Classes: 5
10 train/val/test splits
Train: 87
Val: 59
Test: 37
- Parameters:
raw_dir (str, optional) β Raw file directory to store the processed data. Default: ~/.dgl/
force_reload (bool, optional) β Whether to re-download the data source. Default: False
verbose (bool, optional) β Whether to print progress information. Default: True
transform (callable, optional) β A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access. Default: None
Notes
The graph does not come with edges for both directions.
Examples
>>> from dgl.data import CornellDataset >>> dataset = CornellDataset() >>> g = dataset[0] >>> num_classes = dataset.num_classes
>>> # get node features >>> feat = g.ndata["feat"]
>>> # get data split >>> train_mask = g.ndata["train_mask"] >>> val_mask = g.ndata["val_mask"] >>> test_mask = g.ndata["test_mask"]
>>> # get labels >>> label = g.ndata['label']
- __getitem__(idx)ο
Gets the data object at index.
- __len__()ο
The number of examples in the dataset.