As a newbie to network analysis, I am struggling with transforming an event level dataset I want to plot into the correct shape. I am grateful for any hints/ leads/ etc. What I did until now, broadly follows this introduction.
The dataset in question contains events organized by the political party Jobbik. Each event defined by a unique id (id
) has associated organizational sponsors (org_names
) and their type (org
). There is no hierarchy between org_1
, org_2
, or org_names1
and org_names2
.
Originally the dataset comes in a wide format. Although I am not sure if this is what I should be doing, the first step I do is to transform the data into a long format and clean a bit the strings. This is the code for reading in the data and getting it into a long format:
jobbik <- read.csv("http://eborbath.github.io/stackoverflow/jobbik.csv")
library(tidyverse)
library(stringr)
library(igraph)
# long format
jobbik <- reshape(as.data.frame(jobbik), dir='long',
varying=list(c(3:13), c(14:24)),
v.names=c('org_names', 'org'), times = c(as.character(seq(1:11))))
jobbik$org <- str_trim(jobbik$org, side="both")
jobbik$org_names <- str_trim(jobbik$org_names, side="both")
jobbik <- jobbik %>%
filter(!(org=="no other organizer"& org_names=="")) %>%
filter(!(org=="JOBBIK"& org_names %in% c("Jobbik",
"Jobbik Magyarországért Mozgalom",
"",
"JObbik",
"jobbik",
"aktivisté Jobbiku",
"a Jobbik"))) %>%
mutate(org_names=ifelse(org_names=="", org, org_names)) %>%
distinct(.)
In the next step I want to create the network dataset. To do so, I calculate the number of times each unique organization has been involved in events with Jobbik. Add Jobbik as one side of each edge and plot the data with igraph:
network <- jobbik %>%
select(id, org_names) %>%
group_by(org_names) %>%
summarise(weight = n()) %>%
ungroup() %>%
mutate(from=1,
org_names=as.factor(org_names)) %>%
mutate(org_id=as.numeric(factor(org_names)))
edges <- network %>% select(from, org_id, weight)
nodes <- network %>% select(org_id, org_names) %>%
mutate(org_names=as.character(org_names))
routes_igraph <- graph_from_data_frame(d = edges, vertices = nodes, directed = FALSE)
plot(routes_igraph, layout = layout_with_graphopt)
While this runs and creates the network, it only gets me the relationship between each unique organization and Jobbik, but not the relationship between these organizations, which do not involve Jobbik. I realize that the error is in the data transformation I do and I should use the event level information to calculate the number of times each organizational pair has been involved in organizing something together, then plot that data. Unfortunately, though I don't know how to get there. I am grateful for any help.