Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201974

(Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?

$
0
0

First of all, this is NOT the problem of calculating Euclidean distance between two matrices.

Assuming I have two matrices x and y, e.g.,

set.seed(1)
x <- matrix(rnorm(15), ncol=5)
y <- matrix(rnorm(20), ncol=5)

where

> x
           [,1]       [,2]      [,3]       [,4]       [,5]
[1,] -0.6264538  1.5952808 0.4874291 -0.3053884 -0.6212406
[2,]  0.1836433  0.3295078 0.7383247  1.5117812 -2.2146999
[3,] -0.8356286 -0.8204684 0.5757814  0.3898432  1.1249309

> y
            [,1]       [,2]        [,3]       [,4]        [,5]
[1,] -0.04493361 0.59390132 -1.98935170 -1.4707524 -0.10278773
[2,] -0.01619026 0.91897737  0.61982575 -0.4781501  0.38767161
[3,]  0.94383621 0.78213630 -0.05612874  0.4179416 -0.05380504
[4,]  0.82122120 0.07456498 -0.15579551  1.3586796 -1.37705956

Then I want to get distance matrix distmat of dimension 3-by-4, where the element distmat[i,j] is the value from norm(x[1,]-y[2,],"2") or dist(rbind(x[1,],y[2,])).

  • My code
distmat <- as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))

which gives

> distmat
         [,1]     [,2]     [,3]     [,4]
[1,] 3.016991 1.376622 2.065831 2.857002
[2,] 4.573625 3.336707 2.698124 1.412811
[3,] 3.764925 2.235186 2.743056 3.358577

but I don't think my code is elegant or efficient enough when with x and y of large number of rows.

  • Objective

I am looking forward to a much faster and more elegant code with base R for this goal. Appreciated in advance!

  • Benchmark Template (in updating)

For your convenience, you can use the following for benchmark to see if your code is faster:

set.seed(1)
x <- matrix(rnorm(15000), ncol=5)
y <- matrix(rnorm(20000), ncol=5)
# my customized approach
method_ThomasIsCoding_v1 <- function() {
  as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))
}
method_ThomasIsCoding_v2 <- function() {
  `dim<-`(with(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), sqrt(rowSums((x[Var1,]-y[Var2,])**2))),c(nrow(x),nrow(y)))
}
method_ThomasIsCoding_v3 <- function() {
  `dim<-`(with(idx1<-list(Var1 = rep(1:nrow(x), nrow(y)), Var2 = rep(1:nrow(y), each = nrow(x))), sqrt(rowSums((x[Var1,]-y[Var2,])**2))),c(nrow(x),nrow(y)))
}
# approach by AllanCameron
method_AllanCameron <- function()
{
  `dim<-`(sqrt(rowSums((x[rep(1:nrow(x), nrow(y)),] - y[rep(1:nrow(y), each = nrow(x)),])^2)), c(nrow(x), nrow(y)))
}
# an existing approach by A. Webb from https://stackoverflow.com/a/35107198/12158757
method_A.Webb <- function() {
  euclidean_distance <- function(p,q) sqrt(sum((p - q)**2))
  outer(
    data.frame(t(x)),
    data.frame(t(y)),
    Vectorize(euclidean_distance)
  )
}

# your approach
method_XXX <- function() {
  # fill with your approach
}

bm <- microbenchmark::microbenchmark(
  method_ThomasIsCoding_v1(),
  method_ThomasIsCoding_v2(),
  method_ThomasIsCoding_v3(),
  method_AllanCameron(),
  # method_A.Webb(),
  #method_XXX(),
  unit = "relative",
  check = "equivalent",
  times = 10
)
bm

such that

Unit: relative
                       expr      min       lq     mean   median       uq      max neval
 method_ThomasIsCoding_v1() 1.684104 1.584569 1.611193 1.586154 1.638271 1.622165    10
 method_ThomasIsCoding_v2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10
 method_ThomasIsCoding_v3() 1.011889 1.017362 1.012064 1.015376 1.026982 1.004885    10
      method_AllanCameron() 1.077066 1.014809 1.029347 1.027979 1.035933 1.029244    10

Viewing all articles
Browse latest Browse all 201974

Trending Articles