PeptidesFunctionalDataset¶
-
class
dgl.data.
PeptidesFunctionalDataset
(raw_dir=None, force_reload=None, verbose=None, transform=None, smiles2graph=<function smiles2graph>)[source]¶ Bases:
dgl.data.dgl_dataset.DGLDataset
Peptides functional dataset for the graph classification task.
DGL dataset of Peptides-func in the LRGB benchmark which contains 15,535 peptides represented as their molecular graph(SMILES) with 10-way multi-task binary classification of their functional classes.
- The 10 classes represent the following functional classes (in order):
[‘antifungal’, ‘cell_cell_communication’, ‘anticancer’, ‘drug_delivery_vehicle’, ‘antimicrobial’, ‘antiviral’, ‘antihypertensive’, ‘antibacterial’, ‘antiparasitic’, ‘toxic’]
Reference https://arxiv.org/abs/2206.08164.pdf
Statistics:
Train examples: 10,873
Valid examples: 2,331
Test examples: 2,331
Average number of nodes: 150.94
Average number of edges: 307.30
Number of atom types: 9
Number of bond types: 3
- Parameters
raw_dir (str) – Directory to store all the downloaded raw datasets. Default: “~/.dgl/”.
force_reload (bool) – Whether to reload the dataset. Default: False.
verbose (bool) – Whether to print out progress information. Default: False.
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.(callable) (smiles2graph) – A callable function that converts a SMILES string into a graph object. * The default smiles2graph requires rdkit to be installed *
Examples
>>> from dgl.data import PeptidesFunctionalDataset
>>> dataset = PeptidesFunctionalDataset() >>> len(dataset) 15535 >>> dataset.num_classes 10 >>> graph, label = dataset[0] >>> graph Graph(num_nodes=119, num_edges=244, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
>>> # support tensor to be index when transform is None >>> # see details in __getitem__ function >>> # get train dataset >>> split_dict = dataset.get_idx_split() >>> trainset = dataset[split_dict["train"]] >>> graph, label = trainset[0] >>> graph Graph(num_nodes=338, num_edges=682, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
>>> # get subset of dataset >>> import torch >>> idx = torch.tensor([0, 1, 2]) >>> dataset_subset = dataset[idx] >>> graph, label = dataset_subset[0] >>> graph Graph(num_nodes=119, num_edges=244, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
-
__getitem__
(idx)[source]¶ Get the idx-th sample.
- Parameters
idx (int or tensor) – The sample index. 1-D tensor as idx is allowed when transform is None.
- Returns
(
dgl.DGLGraph
, Tensor) – Graph with node feature stored infeat
field and its label.or
dgl.data.utils.Subset
– Subset of the dataset at specified indices