catenets.datasets.dataset_acic2016 module

ACIC2016 dataset

get_acic_covariates(fn_csv: pathlib.Path, keep_categorical: bool = False, preprocessed: bool = True) numpy.ndarray
get_acic_orig_filenames(data_path: pathlib.Path, simu_num: int) list
get_acic_orig_outcomes(data_path: pathlib.Path, simu_num: int, i_exp: int) Tuple
load(data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) Tuple
ACIC2016 dataset dataloader.
  • Download the dataset if needed.

  • Load the dataset.

  • Preprocess the data.

  • Return train/test split.

Parameters
  • data_path (Path) – Path to the CSV. If it is missing, it will be downloaded.

  • preprocessed (bool) – Switch between the raw and preprocessed versions of the dataset.

  • original_acic_outcomes (bool) – Switch between new simulations (Inductive bias paper) and original acic outcomes

Returns

  • train_x (array or pd.DataFrame) – Features in training data.

  • train_t (array or pd.DataFrame) – Treatments in training data.

  • train_y (array or pd.DataFrame) – Observed outcomes in training data.

  • train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.

  • test_x (array or pd.DataFrame) – Features in testing data.

  • test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.

preprocess(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = True, original_acic_outcomes: bool = False, **kwargs: Any) Tuple
preprocess_acic_orig(fn_csv: pathlib.Path, data_path: pathlib.Path, preprocessed: bool = False, keep_categorical: bool = True, simu_num: int = 1, i_exp: int = 0, train_size: int = 4000, random_split: bool = False) Tuple
preprocess_simu(fn_csv: pathlib.Path, n_0: int = 2000, n_1: int = 200, n_test: int = 500, error_sd: float = 1, sp_lin: float = 0.6, sp_nonlin: float = 0.3, prop_gamma: float = 0, prop_omega: float = 0, ate_goal: float = 0, inter: bool = True, i_exp: int = 0, keep_categorical: bool = False, preprocessed: bool = True) Tuple