catenets.datasets.dataset_acic2016 module

ACIC2016 dataset

get_acic_covariates(fn_csv: pathlib.Path, keep_categorical: bool = False, preprocessed: bool = True) → numpy.ndarray

get_acic_orig_filenames(data_path: pathlib.Path, simu_num: int) → list

get_acic_orig_outcomes(data_path: pathlib.Path, simu_num: int, i_exp: int) → Tuple

load(data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) → Tuple

ACIC2016 dataset dataloader.

Download the dataset if needed.
Load the dataset.
Preprocess the data.
Return train/test split.

Parameters

data_path (Path) – Path to the CSV. If it is missing, it will be downloaded.
preprocessed (bool) – Switch between the raw and preprocessed versions of the dataset.
original_acic_outcomes (bool) – Switch between new simulations (Inductive bias paper) and original acic outcomes

Returns

train_x (array or pd.DataFrame) – Features in training data.
train_t (array or pd.DataFrame) – Treatments in training data.
train_y (array or pd.DataFrame) – Observed outcomes in training data.
train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.
test_x (array or pd.DataFrame) – Features in testing data.
test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.

preprocess(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) → Tuple

preprocess_acic_orig(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = False, keep_categorical: bool = True, simu_num: int = 1, i_exp: int = 0, train_size: int = 4000, random_split: bool = False) → Tuple

preprocess_simu(fn_csv: pathlib.Path, n_0: int = 2000, n_1: int = 200, n_test: int = 500, error_sd: float = 1, sp_lin: float = 0.6, sp_nonlin: float = 0.3, prop_gamma: float = 0, prop_omega: float = 0, ate_goal: float = 0, inter: bool = True, i_exp: int = 0, keep_categorical: bool = False, preprocessed: bool = True) → Tuple