Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 204798

R: aggregate data while adding new count column using base R

$
0
0

I would like to aggregate a data frame while also adding in a new column (N) that counts the number of rows per value of the grouping variable, in base R.

This is trivial in dplyr:

library(dplyr)
data(iris)

combined_summary <- iris %>% group_by(Species) %>% group_by(N=n(), add=TRUE) %>% summarize_all(mean)

> combined_summary
# A tibble: 3 x 6
# Groups:   Species [3]
  Species        N Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>      <int>        <dbl>       <dbl>        <dbl>       <dbl>
1 setosa        50         5.01        3.43         1.46       0.246
2 versicolor    50         5.94        2.77         4.26       1.33 
3 virginica     50         6.59        2.97         5.55       2.03 

I am however in the unfortunate position of having to write this code in an environment that doesn't allow for packages to be used (don't ask; it's not my decision). So I need a way to do this in base R.

I can do it in base R in a long-winded way as follows:

# First create the aggregated tables separately
summary_means <- aggregate(. ~ Species, data=iris, FUN=mean)
summary_count <- aggregate(Sepal.Length ~ Species, data=iris[, c("Species", "Sepal.Length")], FUN=length)

> summary_means
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

> summary_count
     Species Sepal.Length
1     setosa           50
2 versicolor           50
3  virginica           50

# Then rename the count column
colnames(summary_count)[2] <- "N"> summary_count
     Species  N
1     setosa 50
2 versicolor 50
3  virginica 50

# Finally merge the two dataframes
combined_summary_baseR <- merge(x=summary_count, y=summary_means, by="Species", all.x=TRUE)

> combined_summary_baseR
     Species  N Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa 50        5.006       3.428        1.462       0.246
2 versicolor 50        5.936       2.770        4.260       1.326
3  virginica 50        6.588       2.974        5.552       2.026

Is there any way to do this in a more efficient way in base R?


Viewing all articles
Browse latest Browse all 204798

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>