I'm a student doing exploratory analysis/data vis with this hate crime data set. I am trying to create a matrix of the different categories (i.e. race, religion, etc.) within from my dataset (hate_crime) during 2009 and 2017. The full dataset can be found here.
I extracted the necessary data (incidents during 2009 or 2017) from the existing data.
SecondYear_OTYear <- hate_crime %>% filter(hate_crime$DATA_YEAR == "2017" | hate_crime$DATA_YEAR == "2009")
Then, I just made different subsets for each subcategory in the category. For example, to create subsets of bias descriptions I made the following:
antiWhiteSubset <- SecondYear_OTYear[grep("Anti-White", SecondYear_OTYear$BIAS_DESC), ]
antiWhite17 <- nrow(antiWhiteSubset[antiWhiteSubset$DATA_YEAR == "2017", ])
antiWhite09 <- nrow(antiWhiteSubset[antiWhiteSubset$DATA_YEAR == "2009", ])
antiBlackSubset <- SecondYear_OTYear[grep("Anti-Black", SecondYear_OTYear$BIAS_DESC), ]
antiBlack17 <- nrow(antiBlackSubset[antiBlackSubset$DATA_YEAR == "2017", ])
antiBlack09 <- nrow(antiBlackSubset[antiBlackSubset$DATA_YEAR == "2009", ])
antiLatinoSubset <- SecondYear_OTYear[grep("Anti-Hispanic", SecondYear_OTYear$BIAS_DESC), ]
antiLatino17 <- nrow(antiLatinoSubset[antiLatinoSubset$DATA_YEAR == "2017", ])
antiLatino09 <- nrow(antiLatinoSubset[antiLatinoSubset$DATA_YEAR == "2009", ])
And, I proceeded to do all of the different bias descriptions with the same structure. Then, I created a matrix of the totals to create varying bar plots, mosaic plots, or chi-square analysis, such as the following:
Bar plot of Hate Crime Incidents by Bias Descriptions:
However, I feel like there is a more efficient way to code for the different subsets... I'm open to any suggestions! Thank you so much.