I have a data.frame
with dates distributed across columns and in a messy format: the year
column contains years and NA
s, the column date_old
contains the format Month DD
or DD
(or a date duration) or NA
s, and the column hidden_date
contains text and dates either in thee format .... YYYY ....
or in the format .... DD Month YYYY ....
(with ....
representing general text of variable length).
An example data.frame
looks like this:
df <- data.frame(year = c("1992", "1993", "1995", NA),
date_old = c("February 15", "October 02-24", "15", NA),
hidden_date = c(NA, NA, "The hidden date is 15 July 1995", "The hidden date is 2005"))
I want to get the dates in the format YYYY-MM-DD
(take the first day of date durations) and fill unknown values with zeroes.
Using parse_date_time
didn't help me so far, and the expected output would be:
year date_old hidden_date date
1 1992 February 15 <NA> 1992-02-15
2 1993 October 02-24 <NA> 1993-10-02
3 1995 15 The hidden date is 15 July 1995 1995-07-15
4 <NA> <NA> The hidden date is 2005 2005-00-00
How do I best go about this?