For a graph G, with weighted edge distances d I want to create an hclust like tree object that (using cutree) tells me which nodes in the graph will be connected to each other for a given max distance x. I have seen that manually making cluster trees is a nightmare, so want an out the box method.
I don't want the nodes to be agglomerated and am not sure how the method section of hclust works. I have created the following function to create the distance clustered graphs.
hclust_graph <- function(g, weight = "weight", method = "complete"){
#Hierarchically clusters a graph using a weighted adjacency matrix
#g, an igraph object
#weight, the desired weight vector
#method, the hierachical clustering method to use see hclust for details
weight_vect <- get.edge.attribute(g, weight)
if(is.null(weight_vect)){
message("Weight vector NULL continuing using edge value 'weight' if available")
}
distancedf <- distances(g, weights = weight_vect) %>% as_tibble %>% mutate(from = names(.)) %>%
gather(key = "to", value = "distance",-from)
#graph clustered based edge metric as a distance. Be sure the metric is appropriate!
distgraph <- distancedf %>% spread(to, distance) %>% select(-from) %>% as.matrix
rownames(distgraph) <- colnames(distgraph)
dist_hclust <- distgraph %>% as.dist(., diag = FALSE) %>% hclust(., method)
return(dist_hclust)
}
I think that the function simply uses edge distance between nodes for the cutree when the method is set to single, but I am not sure.
The below example works as expected, but as I don't understand the hclust method this could just be coincidence.
library(igraph)
library(magrittr)
library(tidyr)
g_df <- tibble(from = c("A", "A", "A", "B", "C"),
to = c("E", "C", "B", "C", "D"),
distance = c(1, 5, 4, 3, 1))
g <- graph_from_data_frame(g_df, directed = FALSE)
plot(g)
graph_tree <- hclust_graph(g, weight = "distance", method = "single")
#tree is as expected
plot(graph_tree)
#cutree returns the expected number of clusters and cluster allegiance
cutree(graph_tree, h = 3)
plot(g)
How do I make a dendogram structure or hierarchical clustering object where the distance between the nodes in the graph is all that is used? Can hclust be used? should I use something else?