Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 201839

Creating the merge tree for a weighted igraph network in R

$
0
0

For a graph G, with weighted edge distances d I want to create an hclust like tree object that (using cutree) tells me which nodes in the graph will be connected to each other for a given max distance x. I have seen that manually making cluster trees is a nightmare, so want an out the box method.

I don't want the nodes to be agglomerated and am not sure how the method section of hclust works. I have created the following function to create the distance clustered graphs.

hclust_graph <- function(g, weight = "weight", method = "complete"){
  #Hierarchically clusters a graph using a weighted adjacency matrix
  #g, an igraph object
  #weight, the desired weight vector
  #method, the hierachical clustering method to use see hclust for details

  weight_vect <- get.edge.attribute(g, weight)

  if(is.null(weight_vect)){
    message("Weight vector NULL continuing using edge value 'weight' if available")
  }

  distancedf <- distances(g, weights = weight_vect) %>% as_tibble %>% mutate(from = names(.)) %>%
    gather(key = "to", value = "distance",-from)

  #graph clustered based edge metric as a distance. Be sure the metric is appropriate!
  distgraph <- distancedf %>% spread(to, distance) %>% select(-from) %>% as.matrix
  rownames(distgraph) <- colnames(distgraph)
  dist_hclust <- distgraph %>% as.dist(., diag = FALSE) %>% hclust(., method) 

  return(dist_hclust)

}

I think that the function simply uses edge distance between nodes for the cutree when the method is set to single, but I am not sure.

The below example works as expected, but as I don't understand the hclust method this could just be coincidence.

library(igraph)
library(magrittr)
library(tidyr)

g_df <- tibble(from = c("A", "A", "A", "B", "C"), 
            to =   c("E", "C", "B", "C", "D"), 
            distance = c(1, 5, 4, 3, 1))

g <-  graph_from_data_frame(g_df, directed = FALSE)

plot(g)

graph_tree <- hclust_graph(g, weight = "distance", method = "single")
#tree is as expected
plot(graph_tree)
#cutree returns the expected number of clusters and cluster allegiance
cutree(graph_tree, h = 3)
plot(g)

How do I make a dendogram structure or hierarchical clustering object where the distance between the nodes in the graph is all that is used? Can hclust be used? should I use something else?


Viewing all articles
Browse latest Browse all 201839

Trending Articles