molprop.utils.analyze_mol_data

Functions

cross_check_data

Checks for overlaps between data sets If overlaps are found: returns pd.Series of overlapping entries between the two data sets Otherwise: returns None

extract_mol_features

Extracts all mol features for training, validation and test datasets.

get_mol_features

Creates dictionary containing atom and bond features.

get_mol_features_from_cfg

sanity_check_compounds

sanity_check_data

Checks for duplicates If duplicates are found: returns pd.Series of duplicates (input combinations that occur in duplicates) Otherwise: returns None --- check_conflicts checks for conflicting target values for duplicate input combinations (same input with different label) Note: can take some time if df is large

sanity_check_features

Checks for presence and variation of molecular features.

sanity_check_pipeline

Performs sanity checks for inputs and compund list.