Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

What is the best way to count values within columns to create a summary table?

$
0
0

I have a tbl_df that has several columns that have multiple values in them. I am looking to use the values in the columns to create several columns. After that, I'm looking to summarize the columns.

One way I can go about it is to create several ifelse within a mutate but that seems inefficient. Is there a better way to go about this? I'm thinking that there is probably a dplyr and/or tidyr based solution.

Example of what I'm looking to do is below. It's only a sampling of the data and columns. It doesn't contain all of the columns that I'm looking to create. The summary table will have some sum and mean based columns.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tibble::tribble(
  ~type,      ~bb_type,           ~description,
  "B",            NA,                 "ball",
  "S",            NA,                 "foul",
  "X",  "line_drive", "hit_into_play_no_out",
  "S",            NA,      "swinging_strike",
  "S",            NA,                 "foul",
  "X", "ground_ball",        "hit_into_play",
  "S",            NA,      "swinging_strike",
  "X",    "fly_ball",  "hit_into_play_score",
  "B",            NA,                 "ball",
  "S",            NA,                 "foul"
)


df <- df %>% 
  mutate(ground_ball = ifelse(bb_type == "ground_ball", 1, 0),
         fly_ball = if_else(bb_type == "fly_ball", 1, 0),
         X = if_else(type == "X", 1, 0),
# not sure if this is the based way to go about counting columns that start with swinging to sum later
         swinging_strike = grepl("^swinging", description))

df
#> # A tibble: 10 x 7
#>    type  bb_type    description       ground_ball fly_ball     X swinging_strike
#>    <chr> <chr>      <chr>                   <dbl>    <dbl> <dbl> <lgl>          
#>  1 B     <NA>       ball                       NA       NA     0 FALSE          
#>  2 S     <NA>       foul                       NA       NA     0 FALSE          
#>  3 X     line_drive hit_into_play_no…           0        0     1 FALSE          
#>  4 S     <NA>       swinging_strike            NA       NA     0 TRUE           
#>  5 S     <NA>       foul                       NA       NA     0 FALSE          
#>  6 X     ground_ba… hit_into_play               1        0     1 FALSE          
#>  7 S     <NA>       swinging_strike            NA       NA     0 TRUE           
#>  8 X     fly_ball   hit_into_play_sc…           0        1     1 FALSE          
#>  9 B     <NA>       ball                       NA       NA     0 FALSE          
#> 10 S     <NA>       foul                       NA       NA     0 FALSE

summary_df <- df %>% 
  summarize(n = n(),
            fly_ball = sum(fly_ball, na.rm = TRUE),
            ground_ball = sum(ground_ball, na.rm = TRUE))

summary_df
#> # A tibble: 1 x 3
#>       n fly_ball ground_ball
#>   <int>    <dbl>       <dbl>
#> 1    10        1           1

Created on 2020-02-08 by the reprex package (v0.3.0)


Viewing all articles
Browse latest Browse all 201839

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>