catenets.datasets.dataset_ihdp module

IHDP (Infant Health and Development Program) dataset

get_one_data_set(D: dict, i_exp: int, get_po: bool = True) → dict

Helper for getting the IHDP data for one experiment. Adapted from https://github.com/clinicalml/cfrnet

Parameters

Returns

data – dict with the experiment

Return type

dict or pd.Dataframe

load(data_path: pathlib.Path, exp: int = 1, rescale: bool = False, **kwargs: Any) → Tuple

Get IHDP train/test datasets with treatments and labels.

Parameters

data_path (Path) – Path to the dataset csv. If the data is missing, it will be downloaded.

Returns

X (pd.Dataframe or array) – The training feature set
w (pd.DataFrame or array) – Training treatment assignments.
y (pd.Dataframe or array) – The training labels
training potential outcomes (pd.DataFrame or array.) – Potential outcomes for the training set.
X_t (pd.DataFrame or array) – The testing feature set
testing potential outcomes (pd.DataFrame of array) – Potential outcomes for the testing set.

load_data_npz(fname: pathlib.Path, get_po: bool = True) → dict

Helper function for loading the IHDP data set (adapted from https://github.com/clinicalml/cfrnet)

load_raw(data_path: pathlib.Path) → Tuple

Get IHDP raw train/test sets.

Parameters

data_path (Path) – Path to the dataset csv. If the data is missing, it will be downloaded.

Returns

prepare_ihdp_data(data_train: dict, data_test: dict, rescale: bool = False, setting: str = 'C', return_pos: bool = False) → Tuple

Helper for preprocessing the IHDP dataset.

Parameters

data_train (pd.DataFrame or dict) – Train dataset
data_test (pd.DataFrame or dict) – Test dataset
rescale (bool, default False) – Rescale the outcomes to have similar scale
setting (str, default C) – Experiment setting
return_pos (bool) – Return potential outcomes

Returns

X (dict or pd.DataFrame) – Training Feature set
y (pd.DataFrame or list) – Outcome list
t (pd.DataFrame or list) – Treatment list
cate_true_in (pd.DataFrame or list) – Average treatment effects for the training set
X_t (pd.Dataframe or list) – Test feature set
cate_true_out (pd.DataFrame of list) – Average treatment effects for the testing set