I have this dataframe, "Data", containing one full year of data collected about every half-hour, but for some days only a few hours of data were collected.
Dates are in the format: 31.01.2010 00:30
(all in one cell)
Variables are: Temperature
, humidity
, PM10
, windspeed
, etc.
First question: How can I calculate the daily means, medians, max, min, values of these variables, so I can test each of them in further analysis such as survival analysis with GAM),instead of the hourly/half-hourly data?
Obviously, the calculated daily average/median should be assigned to its corresponding date.
Second question: the DATES
column contains both date and time together, separated by one space in the same cell.
in R, its type is 'Factor'
and I cannot do any calculations, because the error "dates" is missing
, appears.
My guess is that I need to convert it first from Factor
into date/time
so it can be recognized and then to calculate means/medians. But how do I do this?
Can you please indicate what would be the arguments/functions to use?
I think that I have solved the conversion of date from 'Factor' to POSIXlt: I used the function strptime (Data$DATES, format="%d.%m.%Y %H:%M") and now $DATES are recognized as POSIXlt, format "2010-01-01 00:00:00" ....
But I still need to find the function that calculates daily means or averages or medians or whatever.