I am an intermediate user of R and have a data set of ~850,000 rows that was edited through Stata, saved as a csv, but about .01% of the rows got split to the following row after column 4. I am trying to get the file back to its original form, with no split rows. I was using column 4 "type of" as the required condition, but someone below pointed out this won't work. I tested this and all object types in the data frame are indeed "integers". Maybe this would work if I turned the column "type of" for this problem, but here was what I tried:
wages <- for (i in wages) {
if(typeof(wages[i,4]) == "integer") {
cat(i-1, i)
}
}
all I get is NAs.
When trying:
for (i in wages) {
if(typeof(i[ ,4]) == "integer") {
append(i-1, i, after = length(i-1))
}
}
it says:
Error in [.default
(i, , 4) : incorrect number of dimensions
I have spent hours searching for solutions and trying different methods with no success. Thanks in advance for any help.
Snippet of data:
WD County_Name State_Name Cons_Code constructiondescription wagegroup Rate_Effective_Date hourly
113352 CO20190006 Adams Colorado Highway SUCO2011-001 9/15/2011 22.67
113353 CO20190004 Adams Colorado Residential PLUM0058-011 7/1/2018 32.75
113354 (pipefitters exclude hvac pipe) SOUTHWEST CO 8001 METRO 1352 100335 plumber
113355 CO20190004 Adams Colorado Residential PLUM0145-005 8/1/2016 24.58
fringe Rate_Type Craft_Title region st_abbr stcnty_fips mr supergrp
113352 8.73 Open power equipment operator: broom/sweeper arapahoe SOUTHWEST CO 8001 METRO 1352
113353 14.85 CBA plumber/pipefitter (plumbers include hvac pipe) NA NA
113354 1 NA NA
113355 10.47 CBA plumber (plumbers include hvac pipe) & pipefitters (exclude hvac pipe) SOUTHWEST CO 8001 METRO 1352
group key_craft key
113352 100335 operator 1
113353 NA NA
113354 NA NA
113355 100335 plumber 1
Reproducible data:
data <- data.frame(c("CO20190006","CO20190004","(pipefitters exclude hvac pipe)","CO20190004"), #1
c("Adams","Adams","SOUTHWEST","Adams"), #2
c("Colorado","Colorado","CO","Colorado"), #3
c("Highway","Residential","8001","Residential"), #4
c("","","METRO",""), #5
c("SUCO2011-001","PLUM0058-011","1352","PLUM0145-005"), #6
c("9/15/2011","7/1/2018","100335","8/1/2016"), #7
c("22.67","32.75","plumber","24.58"), #8
c("8.73","14.85","1","10.47"), #9
c("Open","CBA","","CBA"), #10
c("power equipment operator: broom/sweeper arapahoe","plumber/pipefitter (plumbers include hvac pipe)","",
"plumber (plumbers include hvac pipe) & pipefitters (exclude hvac pipe)"), #11
c("SOUTHWEST","","","SOUTHWEST"), #12
c("CO","","","CO"), #13
c("8001","NA","NA","8001"), #14
c("METRO","","","METRO"), #15
c("1352","NA","NA","1352"), #16
c("100335","NA","NA","100335"), #17
c("operator","","","plumber"), #18
c("1","NA","NA","1")) #19
colnames(data) <- c("WD","County_Name","State_Name","Cons_Code","constructiondescription","wagegroup","Rate_Effective_Date",
"hourly","fringe","Rate_Type","Craft_Title","region","st_abbr","stcnty_fips","mr","supergrp","group",
"key_craft","key")