QuestionsDatasetο
- class dgl.data.QuestionsDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]ο
Bases:
HeterophilousGraphDataset
Questions dataset from the βA Critical Look at the Evaluation of GNNs under Heterophily: Are We Really Making Progress? <https://arxiv.org/abs/2302.11640>β__ paper.
This dataset is based on data from the question-answering website Yandex Q. Nodes are users, and an edge connects two nodes if one user answered the other userβs question. The task is to predict which users remained active on the website (were not deleted or blocked). Node features are the mean of word embeddings for words in the user description. Users that do not have description are indicated by a separate binary feature.
Statistics:
Nodes: 48921
Edges: 307080
Classes: 2
Node features: 301
10 train/val/test splits
- Parameters:
raw_dir (str, optional) β Raw file directory to store the processed data. Default: ~/.dgl/
force_reload (bool, optional) β Whether to re-download the data source. Default: False
verbose (bool, optional) β Whether to print progress information. Default: True
transform (callable, optional) β A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access. Default: None
Examples
>>> from dgl.data import QuestionsDataset >>> dataset = QuestionsDataset() >>> g = dataset[0] >>> num_classes = dataset.num_classes
>>> # get node features >>> feat = g.ndata["feat"]
>>> # get the first data split >>> train_mask = g.ndata["train_mask"][:, 0] >>> val_mask = g.ndata["val_mask"][:, 0] >>> test_mask = g.ndata["test_mask"][:, 0]
>>> # get labels >>> label = g.ndata['label']
- __getitem__(idx)ο
Gets the data object at index.
- __len__()ο
The number of examples in the dataset.