If a dataset is already in local disk, make sure it’s in directory raw_dir. If one wants to run the code anywhere without bothering to download and move data to the right directory, one can do it automatically by implementing function download().

If the dataset is a zip file, make MyDataset inherit from dgl.data.DGLBuiltinDataset class, which handles the zip file extraction for us. Otherwise, one needs to implement download() like in QM7bDataset:

import os

# path to store the file
file_path = os.path.join(self.raw_dir, self.name + '.mat')


The above code downloads a .mat file to directory self.raw_dir. If the file is a .gz, .tar, .tar.gz or .tgz file, use extract_archive() function to extract. The following code shows how to download a .gz file in BitcoinOTCDataset:

from dgl.data.utils import download, check_sha1

# path to store the file
# make sure to use the same suffix as the original file name's
gz_file_path = os.path.join(self.raw_dir, self.name + '.csv.gz')
# extract file to directory self.name under self.raw_dir

The above code will extract the file into directory self.name under self.raw_dir. If the class inherits from dgl.data.DGLBuiltinDataset to handle zip file, it will extract the file into directory self.name as well.