My dataset has 7 columns(patient level data columns being age,gender etc and a target flag). My set has around 12 Million rows around 16K targets out of the same. I'm trying to match similar patients to my target at at 1:9 ratio. Currently my code takes about 4 days to execute. I've also tried big instances from AWS to run the same code but there is no change in execution time. Is there anything we could do to make it fun faster ?
Please find my code below:
library(MatchIt) library(dplyr) library(optmatch)
Recipe inputs
input <- read.csv('input.csv',header =TRUE, sep=",")
set.seed(1234) match.itzs <- matchit(cohort_flag ~ pat_age + pat_gender + pt_hist_in_months + pt_visit_count, data = input, ratio=9)
df.matchzs <- match.data(match.itzs)[1:ncol(input)]
prp_cohort_psm_zs_test <- df.matchzs