The R package growthcurver
is great for efficient analysis and visualization of organism growth except when there are missing values. Because I have data in wide format (each column is a variable) and the times were random for each variable, there are a ton of NA
s. Unfortunately, the growthcurver
package does not like NA
s, so now I'm stuck with 2 options.
Option A
- Impute the missing data via a logistic regression or machine learning approach (I don't like this option because I've tried
mice
,Hmisc
, for regression imputation but failed because there are more variables (columns) than observations in each column andcaret
for random forest imputation, which did not produce any meaningful imputed values). Imputation also then creates my dataframe to be mostly imputed values which I can't justify.
- Impute the missing data via a logistic regression or machine learning approach (I don't like this option because I've tried
Option B
- Somehow adapt the
growthcurver
function to handleNA
s better than it currently does. I tried poking around with the function but couldn't find a spot where a simplena.omit()
could be plopped in.
- Somehow adapt the
Here's the code that worked with the single-use function SummarizeGrowth()
(when I manually removed NA
s). I should note that this function is useful when one only has a few observations to analyze/visualize but ideally, I would use the function SummarizeGrowthByPlate()
which is a package-derived apply()
function that loops through each column (variable) automatically producing visualizations and results.
- Option C
- Hope the SO community has a quick-fix!
Example Dataframe
time a b c d e f g
1 0.00002 NA NA NA NA NA NA NA
2 0.00003 NA NA NA NA NA NA 0.0000
3 22.00000 NA NA NA NA NA NA NA
4 24.01000 0.1443 0.1554 0.0999 0.1110 0.0999 0.0666 NA
5 24.03000 NA NA NA NA NA NA 0.0666
6 28.00000 NA NA NA NA NA NA NA
7 36.00000 0.2220 0.2775 0.2775 0.1776 0.1221 0.1221 NA
8 39.00000 NA NA NA NA NA NA 0.2442
9 40.00000 NA NA NA NA NA NA NA
10 44.00000 0.3330 0.3885 0.3552 0.3108 0.2664 0.1998 NA
11 46.00000 NA NA NA NA NA NA NA
12 64.00000 NA NA NA NA NA NA 0.7881
13 67.00000 0.9435 1.2210 1.1655 0.9990 1.5984 0.5217 NA
14 88.00000 1.8093 1.8093 1.8093 1.8870 1.6872 1.5096 NA
15 108.00000 NA NA NA NA NA NA 1.6983
Reproducible Data
df <- structure(list(time = c(2e-05, 3e-05, 22, 24.01, 24.03, 28, 36,
39, 40, 44, 46, 64, 67, 88, 108), a = c(NA, NA, NA, 0.1443, NA,
NA, 0.222, NA, NA, 0.333, NA, NA, 0.9435, 1.8093, NA), b = c(NA,
NA, NA, 0.1554, NA, NA, 0.2775, NA, NA, 0.3885, NA, NA, 1.221,
1.8093, NA), c = c(NA, NA, NA, 0.0999, NA, NA, 0.2775, NA, NA,
0.3552, NA, NA, 1.1655, 1.8093, NA), d = c(NA, NA, NA, 0.111,
NA, NA, 0.1776, NA, NA, 0.3108, NA, NA, 0.999, 1.887, NA), e = c(NA,
NA, NA, 0.0999, NA, NA, 0.1221, NA, NA, 0.2664, NA, NA, 1.5984,
1.6872, NA), f = c(NA, NA, NA, 0.0666, NA, NA, 0.1221, NA, NA,
0.1998, NA, NA, 0.5217, 1.5096, NA), g = c(NA, 0, NA, NA, 0.0666,
NA, NA, 0.2442, NA, NA, NA, 0.7881, NA, NA, 1.6983)), class = "data.frame", row.names = c(NA,
-15L))
Success, but required manual removal of NAs from of a single column with SummarizeGrowth()
library(growthcurver)
SummarizeGrowth(df$time[!is.na(df$a)], df$a[!is.na(df$a)])
Fit data to K / (1 + ((K - N0) / N0) * exp(-r * t)):
K N0 r
val: 2.121 0.004 0.085
Residual standard error: 0.02857429 on 2 degrees of freedom
Other useful metrics:
DT 1 / DT auc_l auc_e
8.13 1.2e-01 38.16 38.77
Failure when not manually removing NAs with SummarizeGrowth()
SummarizeGrowth(df$time, dfb$a)
Fit data to K / (1 + ((K - N0) / N0) * exp(-r * t)):
K N0 r
val: 0 0 0
Residual standard error: 0 on 0 degrees of freedom
Other useful metrics:
DT 1 / DT auc_l auc_e
0 0 0 0
Note: cannot fit data
Failure when trying to use automated SummarizeGrowthByPlate()
SummarizeGrowthByPlate(df)
sample k n0 r t_mid t_gen auc_l auc_e sigma note
1 a 0 0 0 0 0 0 0 0 cannot fit data
2 b 0 0 0 0 0 0 0 0 cannot fit data
3 c 0 0 0 0 0 0 0 0 cannot fit data
4 d 0 0 0 0 0 0 0 0 cannot fit data
5 e 0 0 0 0 0 0 0 0 cannot fit data
6 f 0 0 0 0 0 0 0 0 cannot fit data
7 g 0 0 0 0 0 0 0 0 cannot fit data