FakeNewsDataset

class dgl.data.FakeNewsDataset(name, feature_name, raw_dir=None, transform=None)[source]

Bases: dgl.data.dgl_dataset.DGLBuiltinDataset

Fake News Graph Classification dataset.

The dataset is composed of two sets of tree-structured fake/real news propagation graphs extracted from Twitter. Different from most of the benchmark datasets for the graph classification task, the graphs in this dataset are directed tree-structured graphs where the root node represents the news, the leaf nodes are Twitter users who retweeted the root news. Besides, the node features are encoded user historical tweets using different pretrained language models:

  • bert: the 768-dimensional node feature composed of Twitter user historical tweets encoded by the bert-as-service

  • content: the 310-dimensional node feature composed of a 300-dimensional “spacy” vector plus a 10-dimensional “profile” vector

  • profile: the 10-dimensional node feature composed of ten Twitter user profile attributes.

  • spacy: the 300-dimensional node feature composed of Twitter user historical tweets encoded by the spaCy word2vec encoder.

Reference: <https://github.com/safe-graph/GNN-FakeNews>

Note: this dataset is for academic use only, and commercial use is prohibited.

Statistics:

Politifact:

  • Graphs: 314

  • Nodes: 41,054

  • Edges: 40,740

  • Classes:

    • Fake: 157

    • Real: 157

  • Node feature size:

    • bert: 768

    • content: 310

    • profile: 10

    • spacy: 300

Gossipcop:

  • Graphs: 5,464

  • Nodes: 314,262

  • Edges: 308,798

  • Classes:

    • Fake: 2,732

    • Real: 2,732

  • Node feature size:

    • bert: 768

    • content: 310

    • profile: 10

    • spacy: 300

Parameters
  • name (str) – Name of the dataset (gossipcop, or politifact)

  • feature_name (str) – Name of the feature (bert, content, profile, or spacy)

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

name

Name of the dataset (gossipcop, or politifact)

Type

str

num_classes

Number of label classes

Type

int

num_graphs

Number of graphs

Type

int

graphs

A list of DGLGraph objects

Type

list

labels

Graph labels

Type

Tensor

feature_name

Name of the feature (bert, content, profile, or spacy)

Type

str

feature

Node features

Type

Tensor

train_mask

Mask of training set

Type

Tensor

val_mask

Mask of validation set

Type

Tensor

test_mask

Mask of testing set

Type

Tensor

Examples

>>> dataset = FakeNewsDataset('gossipcop', 'bert')
>>> graph, label = dataset[0]
>>> num_classes = dataset.num_classes
>>> feat = dataset.feature
>>> labels = dataset.labels
__getitem__(i)[source]

Get graph and label by index

Parameters

i (int) – Item index

Returns

Return type

(dgl.DGLGraph, Tensor)

__len__()[source]

Number of graphs in the dataset.

Returns

Return type

int