I have a large data of scraped reports. Date information is in the text of the reports, and I have converted them to a character vector of the following format:
date_vec <- c("2001-4-31", "2000-12-31", "2003-6-31")
However, as can be seen in the example some of the reports have human errors, and when I try to convert them to "Date" format as.Date(date_vec)
doesn't work, because "2001-4-31" and "2003-6-31" are not real dates (only 30 days in April and June).
I want to convert the data to "Date" format by approximating to the nearest Date value that makes sense so that I get something like the following:
date_vec
[1] "2001-4-30""2000-12-31""2003-6-30"
Other than a brute-force way of creating a list of common mistakes and checking for them, is there a good way to do that?