FraudYelpDataset¶
-
class
dgl.data.
FraudYelpDataset
(raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]¶ Bases:
dgl.data.fraud.FraudDataset
Fraud Yelp Dataset
The Yelp dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. A spam review detection task can be conducted, which is a binary classification task. 32 handcrafted features from <http://dx.doi.org/10.1145/2783258.2783370> are taken as the raw node features. Reviews are nodes in the graph, and three relations are:
R-U-R: it connects reviews posted by the same user
R-S-R: it connects reviews under the same product with the same star rating (1-5 stars)
R-T-R: it connects two reviews under the same product posted in the same month.
Statistics:
Nodes: 45,954
Edges:
R-U-R: 98,630
R-T-R: 1,147,232
R-S-R: 6,805,486
Classes:
Positive (spam): 6,677
Negative (legitimate): 39,277
Positive-Negative ratio: 1 : 5.9
Node feature size: 32
- Parameters
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
random_seed (int) – Specifying the random seed in splitting the dataset. Default: 717
train_size (float) – training set size of the dataset. Default: 0.7
val_size (float) – validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
Examples
-
__getitem__
(idx)¶ Get graph object
- Parameters
idx (int) – Item index
- Returns
graph structure, node features, node labels and masks
ndata['feature']
: node featuresndata['label']
: node labelsndata['train_mask']
: mask of training setndata['val_mask']
: mask of validation setndata['test_mask']
: mask of testing set
- Return type
-
__len__
()¶ number of data examples