FraudDatasetο
- class dgl.data.FraudDataset(name, raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]ο
Bases:
DGLBuiltinDataset
Fraud node prediction dataset.
The dataset includes two multi-relational graphs extracted from Yelp and Amazon where nodes represent fraudulent reviews or fraudulent reviewers.
It was first proposed in a CIKMβ20 paper <https://arxiv.org/pdf/2008.08692.pdf> and has been used by a recent WWWβ21 paper <https://ponderly.github.io/pub/PCGNN_WWW2021.pdf> as a benchmark. Another paper <https://arxiv.org/pdf/2104.01404.pdf> also takes the dataset as an example to study the non-homophilous graphs. This dataset is built upon industrial data and has rich relational information and unique properties like class-imbalance and feature inconsistency, which makes the dataset be a good instance to investigate how GNNs perform on real-world noisy graphs. These graphs are bidirected and not self connected.
Reference: <https://github.com/YingtongDou/CARE-GNN>
- Parameters:
name (str) β Name of the dataset
raw_dir (str) β Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
random_seed (int) β Specifying the random seed in splitting the dataset. Default: 717
train_size (float) β training set size of the dataset. Default: 0.7
val_size (float) β validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1
force_reload (bool) β Whether to reload the dataset. Default: False
verbose (bool) β Whether to print out progress information. Default: True.
transform (callable, optional) β A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- graphο
Graph structure, etc.
- Type:
Examples
>>> dataset = FraudDataset('yelp') >>> graph = dataset[0] >>> num_classes = dataset.num_classes >>> feat = graph.ndata['feature'] >>> label = graph.ndata['label']
- __getitem__(idx)[source]ο
Get graph object
- Parameters:
idx (int) β Item index
- Returns:
graph structure, node features, node labels and masks
ndata['feature']
: node featuresndata['label']
: node labelsndata['train_mask']
: mask of training setndata['val_mask']
: mask of validation setndata['test_mask']
: mask of testing set
- Return type: