for (i in 1:99653)
{
for(j in 1:3226)
{
if (grepl(cdata$LegDigitsDialed[i],sdata$SavedPhone[j]) == TRUE)
{
cdata$category[i] = "Supplier"
cdata$su_name[i] = sdata$sushortname[j]
}
else
{
cdata$category[i] = "Customer"
cdata$su_name[i] = "Null"
}
}
}
I have two data frames and I want to categorize each element of a column based on the presence in the second data frame.
My data looks like this:
>cdata
LegDigitsDialed
"a""b""c">sdata
SavedPhone
"aa""c"
What I want is;
LegDigitsDialed category
"a""Supplier""b""Customer""c""Supplier"
So basically my pseudo code is
for (i=1,i<100000,i++) for(j=1,j<3500,j++)
{
if (j contains i) //partial string matching
populate i(different column) with some value
else
populate i(different column) with some other value
}
this script in R has been running for over 24 hours now, and only one third of the records have been processed. Is there anyway to optimize this code.