catenets.datasets.dataset_ihdp module

IHDP (Infant Health and Development Program) dataset

get_one_data_set(D: dict, i_exp: int, get_po: bool = True) dict

Helper for getting the IHDP data for one experiment. Adapted from https://github.com/clinicalml/cfrnet

Parameters
  • D (dict or pd.DataFrame) – All the experiment

  • i_exp (int) – Experiment number

Returns

data – dict with the experiment

Return type

dict or pd.Dataframe

load(data_path: pathlib.Path, exp: int = 1, rescale: bool = False, **kwargs: Any) Tuple

Get IHDP train/test datasets with treatments and labels.

Parameters

data_path (Path) – Path to the dataset csv. If the data is missing, it will be downloaded.

Returns

  • X (pd.Dataframe or array) – The training feature set

  • w (pd.DataFrame or array) – Training treatment assignments.

  • y (pd.Dataframe or array) – The training labels

  • training potential outcomes (pd.DataFrame or array.) – Potential outcomes for the training set.

  • X_t (pd.DataFrame or array) – The testing feature set

  • testing potential outcomes (pd.DataFrame of array) – Potential outcomes for the testing set.

load_data_npz(fname: pathlib.Path, get_po: bool = True) dict

Helper function for loading the IHDP data set (adapted from https://github.com/clinicalml/cfrnet)

Parameters

fname (Path) – Dataset path

Returns

data – Raw IHDP dict, with X, w, y and yf keys.

Return type

dict

load_raw(data_path: pathlib.Path) Tuple

Get IHDP raw train/test sets.

Parameters

data_path (Path) – Path to the dataset csv. If the data is missing, it will be downloaded.

Returns

  • data_train (dict or pd.DataFrame) – Training data

  • data_test (dict or pd.DataFrame) – Testing data

prepare_ihdp_data(data_train: dict, data_test: dict, rescale: bool = False, setting: str = 'C', return_pos: bool = False) Tuple

Helper for preprocessing the IHDP dataset.

Parameters
  • data_train (pd.DataFrame or dict) – Train dataset

  • data_test (pd.DataFrame or dict) – Test dataset

  • rescale (bool, default False) – Rescale the outcomes to have similar scale

  • setting (str, default C) – Experiment setting

  • return_pos (bool) – Return potential outcomes

Returns

  • X (dict or pd.DataFrame) – Training Feature set

  • y (pd.DataFrame or list) – Outcome list

  • t (pd.DataFrame or list) – Treatment list

  • cate_true_in (pd.DataFrame or list) – Average treatment effects for the training set

  • X_t (pd.Dataframe or list) – Test feature set

  • cate_true_out (pd.DataFrame of list) – Average treatment effects for the testing set