I would like to store a GenomicRanges::GRanges
object from Bioconductor as a single column in a base R data.frame
. The reason I'd like to have it in a base R data.frame is because I'd like to write some ggplot2 functions that exclusively work with data.frames under the hood. However, any attempts I made don't seem to be fruitful. Basically this is what I want to do:
library(GenomicRanges)
x <- GRanges(c("chr1:100-200", "chr1:200-300"))
df <- data.frame(x = x, y = 1:2)
But the column is automatically expanded, whereas I like to keep it as a valid GRanges
object in a single column:
> df
x.seqnames x.start x.end x.width x.strand y
1 chr1 100 200 101 * 1
2 chr1 200 300 101 * 2
When I work with the S4Vectors::DataFrame
, it works as I want, except I'd like a base R data.frame to do the same thing:
> S4Vectors::DataFrame(x = x, y = 1:2)
DataFrame with 2 rows and 2 columns
x y
<GRanges> <integer>
1 chr1:100-200 1
2 chr1:200-300 2
I also tried the following without succes:
> df <- data.frame(y = 1:2)
> df[["x"]] <- x
> df
y x
1 1 <S4 class ‘GRanges’ [package “GenomicRanges”] with 7 slots>
2 2 <NA>
Warning message: In format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x, : corrupt data frame: columns will be truncated or padded with NAs
df[["x"]] <- I(x)
Error in rep(value, length.out = nrows) : attempt to replicate an object of type 'S4'
I had some minor succes with implementing an S3 variant of the GRanges class using vctrs::new_rcrd
, but that seems to be a very roundabout way to get a single column representing a genomic range.