FraudAmazonDataset(raw_dir=None, random_seed=717, train_size=0.7, val_size=0.1, force_reload=False, verbose=True, transform=None)[source]¶
Fraud Amazon Dataset
The Amazon dataset includes product reviews under the Musical Instruments category. Users with more than 80% helpful votes are labelled as benign entities and users with less than 20% helpful votes are labelled as fraudulent entities. A fraudulent user detection task can be conducted on the Amazon dataset, which is a binary classification task. 25 handcrafted features from <https://arxiv.org/pdf/2005.10150.pdf> are taken as the raw node features .
Users are nodes in the graph, and three relations are: 1. U-P-U : it connects users reviewing at least one same product 2. U-S-U : it connects users having at least one same star rating within one week 3. U-V-U : it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.
Positive (fraudulent): 821
Negative (benign): 7,818
Positive-Negative ratio: 1 : 10.5
Node feature size: 25
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
random_seed (int) – Specifying the random seed in splitting the dataset. Default: 717
train_size (float) – training set size of the dataset. Default: 0.7
val_size (float) – validation set size of the dataset, and the size of testing set is (1 - train_size - val_size) Default: 0.1
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a
DGLGraphobject and returns a transformed version. The
DGLGraphobject will be transformed before every access.
>>> dataset = FraudAmazonDataset() >>> graph = dataset >>> num_classes = dataset.num_classes >>> feat = graph.ndata['feature'] >>> label = graph.ndata['label']
Get graph object
idx (int) – Item index
graph structure, node features, node labels and masks
ndata['feature']: node features
ndata['label']: node labels
ndata['train_mask']: mask of training set
ndata['val_mask']: mask of validation set
ndata['test_mask']: mask of testing set
- Return type
number of data examples