catenets.datasets.dataset_twins module
Twins dataset Load real-world individualized treatment effects estimation datasets
- load(data_path: pathlib.Path, train_ratio: float = 0.8, treatment_type: str = 'rand', seed: int = 42, treat_prop: float = 0.5) Tuple
- Twins dataset dataloader.
Download the dataset if needed.
Load the dataset.
Preprocess the data.
Return train/test split.
- Parameters
data_path (Path) – Path to the CSV. If it is missing, it will be downloaded.
train_ratio (float) – Train/test ratio
treatment_type (str) – Treatment generation strategy
seed (float) – Random seed
treat_prop (float) – Treatment proportion
- Returns
train_x (array or pd.DataFrame) – Features in training data.
train_t (array or pd.DataFrame) – Treatments in training data.
train_y (array or pd.DataFrame) – Observed outcomes in training data.
train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.
test_x (array or pd.DataFrame) – Features in testing data.
test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.
- preprocess(fn_csv: pathlib.Path, train_ratio: float = 0.8, treatment_type: str = 'rand', seed: int = 42, treat_prop: float = 0.5) Tuple
Helper for preprocessing the Twins dataset.
- Parameters
fn_csv (Path) – Dataset CSV file path.
train_ratio (float) – The ratio of training data.
treatment_type (string) – The treatment selection strategy.
seed (float) – Random seed.
- Returns
train_x (array or pd.DataFrame) – Features in training data.
train_t (array or pd.DataFrame) – Treatments in training data.
train_y (array or pd.DataFrame) – Observed outcomes in training data.
train_potential_y (array or pd.DataFrame) – Potential outcomes in training data.
test_x (array or pd.DataFrame) – Features in testing data.
test_potential_y (array or pd.DataFrame) – Potential outcomes in testing data.