I'm wondering if there's a way to sample from different data sources at each iteration of training using caret?
I have an imbalanced problem with X true positives, Y true negatives, and Z unknown labels. Z >> X and Y.
The probability of a positive is such that:
p(true positive) << p(true negative)
I'm hoping that I can sample my training data so that at each iteration I have:
A true positive samples, B true negative samples, B unknown samples; where A = 2B
The true negatives and the unknown samples would be given the same label, resulting in a 2-class problem.
I can loop or apply a function that creates a training dataset and then caret::train() using that dataset, but was wondering if caret has this functionality built in?
Cheers