catenets.datasets.dataset_twins module

Twins dataset Load real-world individualized treatment effects estimation datasets

load(data_path: pathlib.Path, train_ratio: float = 0.8, treatment_type: str = 'rand', seed: int = 42, treat_prop: float = 0.5) Tuple
Twins dataset dataloader.
  • Download the dataset if needed.

  • Load the dataset.

  • Preprocess the data.

  • Return train/test split.

Parameters
  • data_path (Path) – Path to the CSV. If it is missing, it will be downloaded.

  • train_ratio (float) – Train/test ratio

  • treatment_type (str) – Treatment generation strategy

  • seed (float) – Random seed

  • treat_prop (float) – Treatment proportion

Returns

  • train_x (array or pd.DataFrame) – Features in training data.

  • train_t (array or pd.DataFrame) – Treatments in training data.

  • train_y (array or pd.DataFrame) – Observed outcomes in training data.

  • train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.

  • test_x (array or pd.DataFrame) – Features in testing data.

  • test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.

preprocess(fn_csv: pathlib.Path, train_ratio: float = 0.8, treatment_type: str = 'rand', seed: int = 42, treat_prop: float = 0.5) Tuple

Helper for preprocessing the Twins dataset.

Parameters
  • fn_csv (Path) – Dataset CSV file path.

  • train_ratio (float) – The ratio of training data.

  • treatment_type (string) – The treatment selection strategy.

  • seed (float) – Random seed.

Returns

  • train_x (array or pd.DataFrame) – Features in training data.

  • train_t (array or pd.DataFrame) – Treatments in training data.

  • train_y (array or pd.DataFrame) – Observed outcomes in training data.

  • train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.

  • test_x (array or pd.DataFrame) – Features in testing data.

  • test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.