The following diagram is close to what I am looking for, however I would like to know if the following is possible:
- Left-alignment of nodes rather than justified along the x-axis?, so for e.g., a flow with only 2 nodes would finish half-way across the x-axis and not at x-max (in my non-toy sankey diagram this is left-aligned, however, I can't work out the difference)
- Remove the hovertext on nodes only (not on links). I tried various combinations of "label", "text", "value", "percent", "name" joined with a "+" OR "all" or "none" or "skip", but none of this seemed to make a difference.
- Take care of drop-off using NA for example, I don't want to see the link from SA to Drop (blue node), but do want to see the green bar at x=-1 to show one person went to SA on their first holiday and hasn't had another holiday. (If I left source=SA and target=NA, the chart was blank). My proposed work-around will be to otherwise colour DROP Node and SA-DROP links to white...
Have annotated the image with desired changes in blue.
require(dplyr); require(plotly); require(RColorBrewer); require(stringr)
# Summarise flow data
dat <- data.frame(customer = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5),
holiday_loc = c("SA", "SA", "AB", "SA", "SA", "SA", "SA", "AB", "AB", "SA", "SA", "SA")) %>%
group_by(customer) %>%
mutate(holiday_num = seq_along(customer),
source=paste0(holiday_loc, '_', holiday_num),
target = lead(source),
last_hol = ifelse(holiday_num == n(), 'Y', 'N')) %>%
filter(last_hol== 'N'| holiday_num == 1) %>%
select(-last_hol)
sank_links <- dat %>%
group_by(source, target) %>%
summarise(n=n()) %>%
mutate(target=ifelse(is.na(target), "DROP", target)) # is there another option here?
# obtain colours for nodes
f <- function(pal) brewer.pal(brewer.pal.info[pal, "maxcolors"], pal)
cols <- f("Set1")
# set up nodes
sank_nodes <- data.frame(
name = factor(sort(unique(c(as.character(sank_links$source),
as.character(sank_links$target)))))
) %>%
mutate(label=sub("_[0-9]$", "", name),
# for some unknown reason, plotly allows only three labels to be the same
label_pad=sub("_[1-3]$", "", name),
label_pad=sub("_[4-6]$", "", label_pad)) %>%
arrange(label) %>%
mutate(color = cols[cumsum(1-duplicated(label))])
# update links to get index of node and name (without holiday_num)
sank_links <- sank_links %>%
mutate(source_num = match(source, sank_nodes$name) -1 ,
source_name = str_replace(source, "_[0-9]$", ""),
target_num = match(target, sank_nodes$name) - 1,
target_name = str_replace(target, "_[0-9]$", ""))
# diagram
p <- plot_ly(
type = "sankey",
domain = c(
x = c(0,1),
y = c(0,1)
),
orientation = "h",
valueformat = ".0f",
valuesuffix = "Customers",
arrangement="fixed",
node = list(
label = sank_nodes$label_pad,
color = sank_nodes$color,
pad = 15,
thickness = 15,
line = list(
color = "black",
width = 0.5
)
),
link = list(
source = sank_links$source_num,
target = sank_links$target_num,
value = sank_links$n
)
) %>%
layout(
title = "",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F)
)
p
EDIT: I initially didn't how to to label x-axis with breaks corresponding to nodes and provide title to x-axis; code is as follows:
%>%
layout(
title = "",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F, title="Holiday Number", tickvals=-1:4, ticktext=1:6),
yaxis = list(showgrid = F, zeroline = F, showticklabels=FALSE)
)
Source: https://plot.ly/r/reference/#layout-xaxis-tickformat