PeptidesStructuralDatasetยถ
-
class
dgl.data.
PeptidesStructuralDataset
(raw_dir=None, force_reload=None, verbose=None, transform=None, smiles2graph=<function smiles2graph>)[source]ยถ Bases:
dgl.data.dgl_dataset.DGLDataset
Peptides structure dataset for the graph regression task.
DGL dataset of Peptides-struct in the LRGB benchmark which contains 15,535 small peptides represented as their molecular graph (SMILES) with 11 regression targets derived from the peptideโs 3D structure.
The 11 regression targets were precomputed from moleculesโ 3D structure:
Inertia_mass_[a-c]: The principal component of the inertia of the mass, with some normalizations. (Sorted)
Inertia_valence_[a-c]: The principal component of the inertia of the Hydrogen atoms. This is basically a measure of the 3D distribution of hydrogens. (Sorted)
length_[a-c]: The length around the 3 main geometric axis of the 3D objects (without considering atom types). (Sorted)
Spherocity: SpherocityIndex descriptor computed by rdkit.Chem.rdMolDescriptors.CalcSpherocityIndex
Plane_best_fit: Plane of best fit (PBF) descriptor computed by rdkit.Chem.rdMolDescriptors.CalcPBF
Reference https://arxiv.org/abs/2206.08164.pdf
Statistics:
Train examples: 10,873
Valid examples: 2,331
Test examples: 2,331
Average number of nodes: 150.94
Average number of edges: 307.30
Number of atom types: 9
Number of bond types: 3
- Parameters
raw_dir (str) โ Directory to store all the downloaded raw datasets. Default: โ~/.dgl/โ.
force_reload (bool) โ Whether to reload the dataset. Default: False.
verbose (bool) โ Whether to print out progress information. Default: False.
transform (callable, optional) โ A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.smiles2graph (callable) โ A callable function that converts a SMILES string into a graph object. * The default smiles2graph requires rdkit to be installed *
Examples
>>> from dgl.data import PeptidesStructuralDataset
>>> dataset = PeptidesStructuralDataset() >>> len(dataset) 15535 >>> dataset.num_atom_types 9 >>> graph, label = dataset[0] >>> graph Graph(num_nodes=119, num_edges=244, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
>>> # support tensor to be index when transform is None >>> # see details in __getitem__ function >>> # get train dataset >>> split_dict = dataset.get_idx_split() >>> trainset = dataset[split_dict["train"]] >>> graph, label = trainset[0] >>> graph Graph(num_nodes=338, num_edges=682, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
>>> # get subset of dataset >>> import torch >>> idx = torch.tensor([0, 1, 2]) >>> dataset_subset = dataset[idx] >>> graph, label = dataset_subset[0] >>> graph Graph(num_nodes=119, num_edges=244, ndata_schemes={'feat': Scheme(shape=(9,), dtype=torch.int64)} edata_schemes={'feat': Scheme(shape=(3,), dtype=torch.int64)})
-
__getitem__
(idx)[source]ยถ Get the idx-th sample.
- Parameters
idx (int or tensor) โ The sample index. 1-D tensor as idx is allowed when transform is None.
- Returns
(
dgl.DGLGraph
, Tensor) โ Graph with node feature stored infeat
field and its label.or
dgl.data.utils.Subset
โ Subset of the dataset at specified indices