Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201894

Handling missing values in growthcurver

$
0
0

The R package growthcurver is great for efficient analysis and visualization of organism growth except when there are missing values. Because I have data in wide format (each column is a variable) and the times were random for each variable, there are a ton of NAs. Unfortunately, the growthcurver package does not like NAs, so now I'm stuck with 2 options.

  • Option A

    • Impute the missing data via a logistic regression or machine learning approach (I don't like this option because I've tried mice, Hmisc, for regression imputation but failed because there are more variables (columns) than observations in each column and caret for random forest imputation, which did not produce any meaningful imputed values). Imputation also then creates my dataframe to be mostly imputed values which I can't justify.
  • Option B

    • Somehow adapt the growthcurver function to handle NAs better than it currently does. I tried poking around with the function but couldn't find a spot where a simple na.omit() could be plopped in.

Here's the code that worked with the single-use function SummarizeGrowth() (when I manually removed NAs). I should note that this function is useful when one only has a few observations to analyze/visualize but ideally, I would use the function SummarizeGrowthByPlate() which is a package-derived apply() function that loops through each column (variable) automatically producing visualizations and results.

  • Option C
    • Hope the SO community has a quick-fix!

Example Dataframe

        time      a      b      c      d      e      f      g
1    0.00002     NA     NA     NA     NA     NA     NA     NA
2    0.00003     NA     NA     NA     NA     NA     NA 0.0000
3   22.00000     NA     NA     NA     NA     NA     NA     NA
4   24.01000 0.1443 0.1554 0.0999 0.1110 0.0999 0.0666     NA
5   24.03000     NA     NA     NA     NA     NA     NA 0.0666
6   28.00000     NA     NA     NA     NA     NA     NA     NA
7   36.00000 0.2220 0.2775 0.2775 0.1776 0.1221 0.1221     NA
8   39.00000     NA     NA     NA     NA     NA     NA 0.2442
9   40.00000     NA     NA     NA     NA     NA     NA     NA
10  44.00000 0.3330 0.3885 0.3552 0.3108 0.2664 0.1998     NA
11  46.00000     NA     NA     NA     NA     NA     NA     NA
12  64.00000     NA     NA     NA     NA     NA     NA 0.7881
13  67.00000 0.9435 1.2210 1.1655 0.9990 1.5984 0.5217     NA
14  88.00000 1.8093 1.8093 1.8093 1.8870 1.6872 1.5096     NA
15 108.00000     NA     NA     NA     NA     NA     NA 1.6983

Reproducible Data

df <- structure(list(time = c(2e-05, 3e-05, 22, 24.01, 24.03, 28, 36, 
39, 40, 44, 46, 64, 67, 88, 108), a = c(NA, NA, NA, 0.1443, NA, 
NA, 0.222, NA, NA, 0.333, NA, NA, 0.9435, 1.8093, NA), b = c(NA, 
NA, NA, 0.1554, NA, NA, 0.2775, NA, NA, 0.3885, NA, NA, 1.221, 
1.8093, NA), c = c(NA, NA, NA, 0.0999, NA, NA, 0.2775, NA, NA, 
0.3552, NA, NA, 1.1655, 1.8093, NA), d = c(NA, NA, NA, 0.111, 
NA, NA, 0.1776, NA, NA, 0.3108, NA, NA, 0.999, 1.887, NA), e = c(NA, 
NA, NA, 0.0999, NA, NA, 0.1221, NA, NA, 0.2664, NA, NA, 1.5984, 
1.6872, NA), f = c(NA, NA, NA, 0.0666, NA, NA, 0.1221, NA, NA, 
0.1998, NA, NA, 0.5217, 1.5096, NA), g = c(NA, 0, NA, NA, 0.0666, 
NA, NA, 0.2442, NA, NA, NA, 0.7881, NA, NA, 1.6983)), class = "data.frame", row.names = c(NA, 
-15L))

Success, but required manual removal of NAs from of a single column with SummarizeGrowth()

library(growthcurver)

SummarizeGrowth(df$time[!is.na(df$a)], df$a[!is.na(df$a)])

Fit data to K / (1 + ((K - N0) / N0) * exp(-r * t)): 
    K   N0  r
  val:  2.121   0.004   0.085
  Residual standard error: 0.02857429 on 2 degrees of freedom

Other useful metrics:
  DT    1 / DT  auc_l   auc_e
  8.13  1.2e-01 38.16   38.77

Failure when not manually removing NAs with SummarizeGrowth()

SummarizeGrowth(df$time, dfb$a)

Fit data to K / (1 + ((K - N0) / N0) * exp(-r * t)): 
    K   N0  r
  val:  0   0   0
  Residual standard error: 0 on 0 degrees of freedom

Other useful metrics:
  DT    1 / DT  auc_l   auc_e
  0 0   0   0

Note: cannot fit data

Failure when trying to use automated SummarizeGrowthByPlate()

SummarizeGrowthByPlate(df)

sample k n0 r t_mid t_gen auc_l auc_e sigma            note
1      a 0  0 0     0     0     0     0     0 cannot fit data
2      b 0  0 0     0     0     0     0     0 cannot fit data
3      c 0  0 0     0     0     0     0     0 cannot fit data
4      d 0  0 0     0     0     0     0     0 cannot fit data
5      e 0  0 0     0     0     0     0     0 cannot fit data
6      f 0  0 0     0     0     0     0     0 cannot fit data
7      g 0  0 0     0     0     0     0     0 cannot fit data

Viewing all articles
Browse latest Browse all 201894

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>