Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 209026

Different result grep() and "=="

$
0
0

I have the following dummy data.frame:

setseed(666)
df<-data.frame(ID=rep(c("A","B","C"),each=11),Year=rep(2010:2020,each=1,3),x1=floor(runif(33,0,10)),x2=floor(runif(33,0,2)),
                   x3=floor(runif(33,1,100)),x4=floor(runif(33,1,100)),x5=floor(runif(33,1,100)))

I'd like to know how many NAs- either as string "NA" or missing value NA the data.frame contains. To test I run the following lines:

print(length(grep("\\<NA\\>", df)))
print(length(which(is.na(df))))
print(length(which(df=="NA")))

Introducing NAs as missing values:

df1.na$x1[rbinom(33,1,0.1)==1]<-"NA"
df1.na$x2[rbinom(33,1,0.1)==1]<-NA
df1.na$x3[rbinom(33,1,0.1)==1]<-NA
df1.na$x4[rbinom(33,1,0.1)==1]<-NA
df1.na$x5[rbinom(33,1,0.1)==1]<-NA

The results of the same lines as above differ from 2 to 5. Is.na() works fine for the missing values. But the string-match seems to be off as you can see below:

print(length(grep("\\<NA\\>", df)))
print(length(which(is.na(df))))
print(length(which(df=="NA")))

I'd expect grep() and "==" to return the same answer when looking for the string "NA", but they differ greatly and i don't know why. And which one is the better? I noticed for larger data.frames(2.000.000x 30 ) grep() takes very long. Any faster options?

Thanks.A.Lot!


Viewing all articles
Browse latest Browse all 209026

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>