My dataset looks like this (subjectID actually is the 2d column and condition the third). Condition is 0 and 1 (this is not the whole dataframe).
SubjectID Conditie V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1 1 0 -20.44 -1.40 -10.93 86.06 -23.92 -0.359 6.1 75.216 44.4 83.0 4.18 24.01 -13.29 80.84 -5.85
2 2 0 49.20 -35.96 7.96 -6.47 46.89 -22.181 27.8 -16.873 -61.5 3.8 32.54 -15.80 17.19 2.32 -10.71
3 3 0 -38.00 19.68 31.21 -114.47 -43.31 66.569 -45.3 -86.840 102.4 253.7 -3.12 2.81 -21.25 -69.42 -40.64
4 4 0 51.88 52.34 -83.92 157.60 98.02 99.576 20.7 157.324 135.0 104.9 113.69 18.17 -18.41 137.90 -52.75
5 5 0 -25.46 30.87 -30.35 61.04 -1.29 175.212 -80.7 101.502 46.4 183.0 -35.98 9.91 -35.62 79.46 -66.25
6 6 0 2.85 18.49 41.63 99.97 22.35 64.988 35.4 122.737 113.8 89.8 -5.36 70.16 -32.12 140.91 1.45
7 7 0 21.13 -29.08 -34.97 -74.16 13.41 -63.383 15.3 -58.425 -59.0 58.9 -71.47 -105.64 -118.21 -64.33 -48.09
8 8 0 113.27 543.65 615.94 14.38 145.73 854.745 -140.9 851.710 725.8 -722.2 -221.21 652.29 -378.17 824.00 -54.44
9 9 0 -150.88 101.24 -199.41 -7.63 -130.06 117.425 -162.4 179.808 55.9 -200.9 1.12 -114.71 -231.54 17.47 -253.13
10 10 0 -179.69 272.76 174.75 -15.12 -162.90 207.947 186.1 94.898 268.6 634.0 29.91 -62.72 192.38 252.75 -92.70
11 11 0 417.49 101.05 11.69 -70.23 147.65 -19.403 536.7 285.809 283.5 -284.3 116.10 -68.84 214.01 181.62 56.99
12 12 0 -12.03 2.69 22.07 -39.80 -14.13 0.240 28.0 -24.242 20.2 123.7 14.48 -12.79 17.38 58.10 -38.29
13 13 0 -48.99 51.37 -48.54 82.99 -77.09 56.406 -39.6 113.484 -34.4 51.0 -39.91 -6.11 -7.92 32.38 25.54
14 14 0 -27.10 71.17 -32.10 102.32 6.53 216.710 75.1 138.506 159.0 -52.0 40.55 47.02 -28.68 164.09 43.74
15 15 0 85.85 124.12 85.09 -49.86 88.62 151.829 95.2 -54.738 34.9 -36.7 157.22 -147.66 102.82 -40.71 134.96
16 16 0 -56.60 -8.96 -111.23 16.75 -26.90 -46.913 -102.7 0.403 26.8 10.5 -26.29 44.60 -129.68 13.74 -83.49
17 17 0 -26.80 107.27 128.63 130.91 -31.72 105.698 173.4 82.380 55.7 19.5 299.54 66.69 -5.14 216.11 15.88
18 18 0 -38.43 25.52 26.88 -5.98 -21.63 11.358 42.1 -30.672 248.2 234.6 -62.65 -17.48 -76.02 7.84 -4.32
19 19 0 16.06 2.43 72.27 49.17 4.53 28.257 33.6 76.263 59.5 -46.1 31.77 54.86 60.13 51.81 36.70
20 20 0 18.79 91.35 231.11 64.18 -37.53 6.920 165.2 56.826 76.5 58.5 204.40 181.60 181.48 -85.98 55.63
I am trying to make a Train and Validation dataset for a glmNet model & prediction. Random sampling regarding SubjectID is fine, but I want both datasets to have a 50/50 ratio regarding the conditions (0&1). The code I used (see below) doesn't satisfy my needs.
set.seed(123)
train <- sample(nrow(classData), 0.75*nrow(classData), replace = FALSE)
TrainSet <- classData[train,]
ValidSet <- classData[-train,]
I would like to hear your recommendations on this! Thanks in advance for your time.