catenets.datasets.dataset_acic2016 module
ACIC2016 dataset
- get_acic_covariates(fn_csv: pathlib.Path, keep_categorical: bool = False, preprocessed: bool = True) numpy.ndarray
- get_acic_orig_filenames(data_path: pathlib.Path, simu_num: int) list
- get_acic_orig_outcomes(data_path: pathlib.Path, simu_num: int, i_exp: int) Tuple
- load(data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) Tuple
- ACIC2016 dataset dataloader.
Download the dataset if needed.
Load the dataset.
Preprocess the data.
Return train/test split.
- Parameters
data_path (Path) – Path to the CSV. If it is missing, it will be downloaded.
preprocessed (bool) – Switch between the raw and preprocessed versions of the dataset.
original_acic_outcomes (bool) – Switch between new simulations (Inductive bias paper) and original acic outcomes
- Returns
train_x (array or pd.DataFrame) – Features in training data.
train_t (array or pd.DataFrame) – Treatments in training data.
train_y (array or pd.DataFrame) – Observed outcomes in training data.
train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.
test_x (array or pd.DataFrame) – Features in testing data.
test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.
- preprocess(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) Tuple
- preprocess_acic_orig(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = False, keep_categorical: bool = True, simu_num: int = 1, i_exp: int = 0, train_size: int = 4000, random_split: bool = False) Tuple
- preprocess_simu(fn_csv: pathlib.Path, n_0: int = 2000, n_1: int = 200, n_test: int = 500, error_sd: float = 1, sp_lin: float = 0.6, sp_nonlin: float = 0.3, prop_gamma: float = 0, prop_omega: float = 0, ate_goal: float = 0, inter: bool = True, i_exp: int = 0, keep_categorical: bool = False, preprocessed: bool = True) Tuple