--- title: "United Nations Voting Correlations" author: "David Robinson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{United Nations Voting Correlations} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo = FALSE} library(knitr) options(width = 102) knitr::opts_chunk$set(message = FALSE, warning = FALSE) library(ggplot2) theme_set(theme_bw()) ``` Here we'll examine an example application of the widyr package, particularly the `pairwise_cor` and `pairwise_dist` functions. We'll use the data on United Nations General Assembly voting from the `unvotes` package: ```{r echo = FALSE} if (!requireNamespace("unvotes", quietly = TRUE)) { print("This vignette requires the unvotes package to be installed. Exiting...") knitr::knit_exit() } ``` ```{r} library(dplyr) library(unvotes) un_votes ``` This dataset has one row for each country for each roll call vote. We're interested in finding pairs of countries that tended to vote similarly. ### Pairwise correlations Notice that the `vote` column is a factor, with levels (in order) "yes", "abstain", and "no": ```{r} levels(un_votes$vote) ``` We may then be interested in obtaining a measure of country-to-country agreement for each vote, using the `pairwise_cor` function. ```{r cors} library(widyr) cors <- un_votes %>% mutate(vote = as.numeric(vote)) %>% pairwise_cor(country, rcid, vote, use = "pairwise.complete.obs", sort = TRUE) cors ``` We could, for example, find the countries that the US is most and least in agreement with: ```{r US_cors} US_cors <- cors %>% filter(item1 == "United States") # Most in agreement US_cors # Least in agreement US_cors %>% arrange(correlation) ``` This can be particularly useful when visualized on a map. ```{r US_cors_map, fig.width = 10, fig.height = 6} if (require("maps", quietly = TRUE) && require("fuzzyjoin", quietly = TRUE) && require("countrycode", quietly = TRUE) && require("ggplot2", quietly = TRUE)) { world_data <- map_data("world") %>% regex_full_join(iso3166, by = c("region" = "mapname")) %>% filter(region != "Antarctica") US_cors %>% mutate(a2 = countrycode(item2, "country.name", "iso2c")) %>% full_join(world_data, by = "a2") %>% ggplot(aes(long, lat, group = group, fill = correlation)) + geom_polygon(color = "gray", size = .1) + scale_fill_gradient2() + coord_quickmap() + theme_void() + labs(title = "Correlation of each country's UN votes with the United States", subtitle = "Blue indicates agreement, red indicates disagreement", fill = "Correlation w/ US") } ``` ### Visualizing clusters in a network Another useful kind of visualization is a network plot, which can be created with Thomas Pedersen's [ggraph package](https://github.com/thomasp85/ggraph). We can filter for pairs of countries with correlations above a particular threshold. ```{r country_network, fig.width = 10, fig.height = 6} if (require("ggraph", quietly = TRUE) && require("igraph", quietly = TRUE) && require("countrycode", quietly = TRUE)) { cors_filtered <- cors %>% filter(correlation > .6) continents <- tibble(country = unique(un_votes$country)) %>% filter(country %in% cors_filtered$item1 | country %in% cors_filtered$item2) %>% mutate(continent = countrycode(country, "country.name", "continent")) set.seed(2017) cors_filtered %>% graph_from_data_frame(vertices = continents) %>% ggraph() + geom_edge_link(aes(edge_alpha = correlation)) + geom_node_point(aes(color = continent), size = 3) + geom_node_text(aes(label = name), check_overlap = TRUE, vjust = 1, hjust = 1) + theme_void() + labs(title = "Network of countries with correlated United Nations votes") } ``` Choosing the threshold for filtering correlations (or other measures of similarity) typically requires some trial and error. Setting too high a threshold will make a graph too sparse, while too low a threshold will make a graph too crowded.