Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205343

Unwanted values being added when using complete()

$
0
0

Happy New Year, everyone!

I'm having an issue with transforming implicit missing data into explicit missing data. I am summarizing the number of observations of birds at specific survey sites. These sites were surveyed once a month for 12 months. Unfortunately, the collected data only contains information on actual observations of birds rather than also recording that no birds were observed at a site. When I attempt to add in the missing data extra observations are being added to the data.

My solution is to use complete() to fill in the missing data (i.e., the site/month surveys were no birds were observed). I am able to fill in the missing sites with no issues. However, when I try and fill in the missing months that's when the extra observations are being added to sites that did, in fact, record an observation of a bird. Specifically, the additional observations are being added to station 9 for March (1 -> 2 birds) and April (1 -> 2 birds) resulting in 32 total birds observed rather than 30 birds.

Below is an example dataset, and the code that I'm using. I have marked where in the code I am having the issue. I'm brand new to the tidyverse so if you have any general pieces of advice on how to improve my code, I'm all ears. Thanks in advance for your help. I've also included a phot of the correct number of observations just in case. Picture of the Correct Number of Observations

libary(tidyverse)
library(lubridate)
library(janitor)

# Create tibble
ea <- tibble(site = c(9,15,9,10,2,8,8,8,8,8,8,8,8,8,8,8,8,7),
date = c("3/26/2013","3/26/2013","4/10/2013","4/20/2013","5/31/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","6/29/2013","1/9/2014"),
indivs = c(1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,1),
within_800 = c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE))

# Create variable that contains all site names
levels_site <- as.character(1:16)

ea %>%
mutate_at(vars(site), factor) %>% # Convert site into a factor
mutate_at(vars(date), mdy) %>% # convert into a date
mutate(year = year(date))%>% # Pull out year
mutate(month = month(date, label = TRUE)) %>% # Pull out month
mutate(date_ym = make_date(year, month))%>% # Since ym() is not available in Lubridate yet, make a new date that puts all observations from a single month on the same day.
group_by(date_ym, site = site) %>% # Group bysite and month

# Issue here: Removing this code results in the accurate number of observations but only lists the months with an observation.
complete(date_ym = seq(make_date(2013, 3), make_date(2014, 3), by = "month"),fill = list(indivs = 0)) %>% # Add in months were an observations wasn't made

summarise(minutes = sum(indivs)) %>% # Count the number of birds observed
complete(site = levels_site) %>% # Add in the stations were observations weren't made
arrange(fct_relevel(site, levels_site), .by_group = TRUE) %>% # Place in ascending numeric order
pivot_wider(names_from = date_ym, values_from = minutes) %>% # Pivot table
adorn_totals(where = c("row", "col")) # Sum each row and column

Viewing all articles
Browse latest Browse all 205343

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>