As a marine biologist, we need to figure out whether the fish abundance of 4 different fish species counted three times over a year differs from one artifical reef to another (reef A, B, and C) and from one month to another (June, September, November). For each area, 3 different replicates are generated (1, 2, 3). Let's consider the gathered data (including the factors for better understanding) as follows:
data <- as.data.frame(matrix(NA, 27, 4, dimnames =
list(1:27, c("Diplodus sargus", "Chelon labrosus", "Oblada melanura", "Seriola dumerii"))))
#fish counts
data$`Diplodus sargus` <- as.numeric(c(0,0,0,0,0,0,0,0,0,5,0,0,3,0,0,0,0,1,0,0,0,0,0,0,4,0,0))
data$`Oblada melanura` <- as.numeric(c(0,0,0,10,0,0,0,0,0,0,0,0,10,5,0,0,0,0,1,0,2,3,0,2,0,0,0))
data$`Chelon labrosus`<- as.numeric(c(0,0,0,0,2,0,6,0,0,0,0,0,3,0,0,2,0,0,0,0,0,3,0,0,0,0,1))
data$`Seriola dumerii` <-as.numeric(c(4,0,2,0,1,1,0,0,9,0,0,0,0,0,3,0,0,7,0,0,0,8,0,0,0,1,0))
#factors
data$reef <- rep(c(rep("A", 3), rep("B",3), rep("C", 3)),3)
data$month <- rep(c(rep("June", 3), rep("September",3), rep("November", 3)),3)
data$combined <- c(rep("JuneA", 3), rep("JuneB",3), rep("JuneC", 3), rep("SepA", 3), rep("SepB",3), rep("SepC", 3),rep("NovA", 3), rep("NovB",3), rep("NOvC", 3))
data$Replicate <- rep(c(rep("1", 3), rep("2", 3), rep("3", 3)))
#square-root data
comp <- sqrt(data[, 1:4])
library(vegan)
mydist <- vegdist(comp, method = "bray")
pl.clust <- hclust(mydist, method = "complete")
Error in hclust(mydist, method = "complete") :
NA/NaN/Inf in foreign function call (arg 11)
The aim is to perform a Permutation ANOVA on the Bray-Curtis similarities of square root-transformed data in order to determine whether samples (assemblages of counted species) differ significantly depending on factors (alone or combined). However, vegdist function cannot handle data set with 0 as it generates vegdist objects containing NaN...which in turn cannot be handled by the adonis function. I thought of simply adding +1 to each counts as it is the differences between the samples that matter and not the absolute values. However, mydist <- ecodist::bcdist(squared_data,rmzero=FALSE) gives a very different result to that first solution. Is anybody familiar with such issue and how to correctly handle it?
Thank you and looking forward to reading you