I am trying to cache a big data.table and then make a plot out of it, the code is as follow:
{r gen-data, tidy=TRUE, warning=FALSE, tidy.opts=list(width.cutoff=60), cache = TRUE, cache.lazy=FALSE}
DT = fread("reference.txt.gz", header = FALSE)
vc = c("chromosome_1", "chromosome_2", "chromosome_3", "chromosome_4", "chromosome_5", "chromosome_6")
colnames(DT) = c("chrom", "position", "score", "corrected base", "score of the corrected base")
DT=setDT(DT, key = "chrom")[J(vc), nomatch = 0]
{r, cache=TRUE, tidy=TRUE, warning=FALSE, tidy.opts=list(width.cutoff=60), dependson='gen-data'}
plot = ggplot(data = DT) + geom_line(aes(x = position, y = score, group = 1), stat = "summary_bin", fun.y = "mean", binwidth = 100000, color = ghibli_palette("MononokeMedium")[2])
ttle = paste0("coverage of the 6 longest scaffolds of Shasta + instagraal assembly")
plot = plot + labs(
title = ttle) + theme(plot.title = element_markdown(lineheight = 1.5, size = 12), legend.text = element_markdown(size = 14))
plot = plot + theme(axis.title = element_markdown(size = 12)) + theme(axis.text.x = element_text(size=5)) + theme(axis.text.y = element_text(size=3))
plot = plot + theme(legend.title = element_markdown(size = 12))
p = plot + facet_wrap(~chrom, scales = "free_x") +xlab( "position") + ylab("mean score per 100 Kb windows")
v = ggplotly(p) %>%
layout(
xaxis = list(automargin=TRUE),
yaxis = list(automargin=TRUE)
)
v
So what I was thinking, is that the first chunk read the data into a data.table, then apply the relevant selection, and finally cache a DT object.
However, the first chunk is evaluated every time, no matter what. Therefore I must be doing something wrong but I can't see what.
Thanks for any help.