molprop.pyg_molgraph.PyGMolgraphDataset
- class molprop.pyg_molgraph.PyGMolgraphDataset(root, transform=None, pre_filter=None, args=None, mol_features=None, without_target=False)
Bases:
InMemoryDatasetPyG dataset with attributed molecular graphs.
- Parameters:
root – Root directory where the dataset should be saved.
- static collate(data_list: Sequence[BaseData]) Tuple[BaseData, Dict[str, Tensor] | None]
Collates a list of
DataorHeteroDataobjects to the internal storage format ofInMemoryDataset.
- copy(idx: slice | Tensor | ndarray | Sequence | None = None) InMemoryDataset
Performs a deep-copy of the dataset. If
idxis not given, will clone the full dataset. Otherwise, will only clone a subset of the dataset from indicesidx. Indices can be slices, lists, tuples, and atorch.Tensorornp.ndarrayof type long or bool.
- cpu(*args: str) InMemoryDataset
Moves the dataset to CPU memory.
- cuda(device: int | str | None = None) InMemoryDataset
Moves the dataset toto CUDA memory.
- download() None
Downloads the dataset to the
self.raw_dirfolder.
- get(idx: int) BaseData
Gets the data object at index
idx.
- get_summary() Any
Collects summary statistics for the dataset.
- property has_download: bool
Checks whether the dataset defines a
download()method.
- index_select(idx: slice | Tensor | ndarray | Sequence) Dataset
Creates a subset of the dataset from specified indices
idx. Indicesidxcan be a slicing object, e.g.,[2:5], a list, a tuple, or atorch.Tensorornp.ndarrayof type long or bool.
- len() int
Returns the number of data objects stored in the dataset.
- load(path: str, data_cls: ~typing.Type[~torch_geometric.data.data.BaseData] = <class 'torch_geometric.data.data.Data'>) None
Loads the dataset from the file path
path.
- property num_classes: int
Returns the number of classes in the dataset.
- property num_edge_features: int
Returns the number of features per edge in the dataset.
- property num_features: int
Returns the number of features per node in the dataset. Alias for
num_node_features.
- property num_node_features: int
Returns the number of features per node in the dataset.
- print_summary(fmt: str = 'psql') None
Prints summary statistics of the dataset to the console.
- Parameters:
fmt (str, optional) – Summary tables format. Available table formats can be found here. (default:
"psql")
- process() None
Processes the dataset to the
self.processed_dirfolder.
- property processed_file_names: str
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property processed_paths: List[str]
The absolute filepaths that must be present in order to skip processing.
- property raw_file_names: str
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- property raw_paths: List[str]
The absolute filepaths that must be present in order to skip downloading.
- classmethod save(data_list: Sequence[BaseData], path: str) None
Saves a list of data objects to the file path
path.
- shuffle(return_perm: bool = False) Dataset | Tuple[Dataset, Tensor]
Randomly shuffles the examples in the dataset.
- Parameters:
return_perm (bool, optional) – If set to
True, will also return the random permutation used to shuffle the dataset. (default:False)
- to(device: int | str) InMemoryDataset
Performs device conversion of the whole dataset.
- to_datapipe() Any
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with :pyg:`PyG's` built-in
DataPipesfor baching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- to_on_disk_dataset(root: str | None = None, backend: str = 'sqlite', log: bool = True) OnDiskDataset
Converts the
InMemoryDatasetto aOnDiskDatasetvariant. Useful for distributed training and hardware instances with limited amount of shared memory.- root (str, optional): Root directory where the dataset should be saved.
If set to
None, will save the dataset inroot/on_disk. Note that it is important to specifyrootto account for different dataset splits. (optional:None)- backend (str): The
Databasebackend to use. (default:
"sqlite")- log (bool, optional): Whether to print any console output while
processing the dataset. (default:
True)