Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205278

Subsetting a list of dataframes by time

$
0
0

I have GPS data in multiple .csv files that I have imported with the following code:

library(readr)
library(tidyverse)

# Data import from target folder

filelist <- list.files("data", pattern = "*.csv")

  filenames <- paste(mgsub::mgsub(filelist, 
                                           c("_", "samples.csv", "[[:digit:]]+"), 
                                           c("", "", "")), sep = "")

  setwd("data")

  data <- sapply(filelist, 
                 read_csv,
                 col_types = cols(Uhrzeit = col_time(format = "%H:%M:%OS"), 
                                  Uhrzeit_1 = col_time(format = "%H:%M:%OS")),
                 simplify = FALSE)



  names(data) <- filenames


  colnames <- c("Aufnahmezeit", 
                "Uhrzeit", 
                "Herzfrequenz [S/min]", 
                "Geschwindigkeit [km/h]", 
                "Distanz [m]", 
                "Beschleunigung [m/s²]", 
                "Schrittfrequenz")

  data <- lapply(data, setNames, colnames)

This returns multiple dataframes (currently 5) of roughly 70000 rows each (see one example below):

str(data)

List of 5
 $ MaxBauer     :Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':   69012 obs. of  7 variables:
  ..$ Aufnahmezeit          : 'hms' num [1:69012] 00:00:00.0 00:00:00.1 00:00:00.2 00:00:00.3 ...
  .. ..- attr(*, "units")= chr "secs"
  ..$ Uhrzeit               : 'hms' num [1:69012] 12:54:13.0 12:54:13.1 12:54:13.2 12:54:13.3 ...
  .. ..- attr(*, "units")= chr "secs"
  ..$ Herzfrequenz [S/min]  : num [1:69012] NA NA NA NA NA NA NA NA NA NA ...
  ..$ Geschwindigkeit [km/h]: num [1:69012] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ Distanz [m]           : num [1:69012] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ Beschleunigung [m/s²] : num [1:69012] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ Schrittfrequenz       : num [1:69012] NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "spec")=
  .. .. cols(
  .. ..   Uhrzeit = col_time(format = "%H:%M:%OS"),
  .. ..   Uhrzeit_1 = col_time(format = "%H:%M:%OS"),
  .. ..   `HF [S/min]` = col_double(),
  .. ..   `Geschwindigkeit [km/h]` = col_double(),
  .. ..   `Distanz [m]` = col_double(),
  .. ..   `Beschleunigung [m/s²]` = col_double(),
  .. ..   Schrittfrequenz = col_double()
  .. .. )

I would now like to subset the data using the renamed column "Uhrzeit" as the reference point. I tried the following:

lapply(data, subset(Uhrzeit >= 46000))

That returned this error:

Error in subset.default(data, Uhrzeit >= 46000) : 
  object 'Uhrzeit' not found

I gather that I need to create a list for the lapply function to work with, e.g. as.list(data), but couldn't get that to work either.

Any help would be greatly appreciated!


Viewing all articles
Browse latest Browse all 205278

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>