---
title: "United Nations Voting Correlations"
author: "David Robinson"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{United Nations Voting Correlations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
  
```{r setup, echo = FALSE}
library(knitr)

options(width = 102)
knitr::opts_chunk$set(message = FALSE, warning = FALSE)

library(ggplot2)
theme_set(theme_bw())
```

Here we'll examine an example application of the widyr package, particularly the `pairwise_cor` and `pairwise_dist` functions. We'll use the data on United Nations General Assembly voting from the `unvotes` package:

```{r echo = FALSE}
if (!requireNamespace("unvotes", quietly = TRUE)) {
  print("This vignette requires the unvotes package to be installed. Exiting...")
  knitr::knit_exit()
}
```

```{r}
library(dplyr)
library(unvotes)

un_votes
```

This dataset has one row for each country for each roll call vote. We're interested in finding pairs of countries that tended to vote similarly.

### Pairwise correlations

Notice that the `vote` column is a factor, with levels (in order) "yes", "abstain", and "no":

```{r}
levels(un_votes$vote)
```

We may then be interested in obtaining a measure of country-to-country agreement for each vote, using the `pairwise_cor` function.

```{r cors}
library(widyr)

cors <- un_votes %>%
  mutate(vote = as.numeric(vote)) %>%
  pairwise_cor(country, rcid, vote, use = "pairwise.complete.obs", sort = TRUE)

cors
```

We could, for example, find the countries that the US is most and least in agreement with:

```{r US_cors}
US_cors <- cors %>%
  filter(item1 == "United States")

# Most in agreement
US_cors

# Least in agreement
US_cors %>%
  arrange(correlation)
```

This can be particularly useful when visualized on a map.

```{r US_cors_map, fig.width = 10, fig.height = 6}
if (require("maps", quietly = TRUE) &&
    require("fuzzyjoin", quietly = TRUE) &&
    require("countrycode", quietly = TRUE) &&
    require("ggplot2", quietly = TRUE)) {
  world_data <- map_data("world") %>%
    regex_full_join(iso3166, by = c("region" = "mapname")) %>%
    filter(region != "Antarctica")
  
  US_cors %>%
    mutate(a2 = countrycode(item2, "country.name", "iso2c")) %>%
    full_join(world_data, by = "a2") %>%
    ggplot(aes(long, lat, group = group, fill = correlation)) +
    geom_polygon(color = "gray", size = .1) +
    scale_fill_gradient2() +
    coord_quickmap() +
    theme_void() +
    labs(title = "Correlation of each country's UN votes with the United States",
         subtitle = "Blue indicates agreement, red indicates disagreement",
         fill = "Correlation w/ US")
}
```

### Visualizing clusters in a network

Another useful kind of visualization is a network plot, which can be created with Thomas Pedersen's [ggraph package](https://github.com/thomasp85/ggraph). We can filter for pairs of countries with correlations above a particular threshold.

```{r country_network, fig.width = 10, fig.height = 6}
if (require("ggraph", quietly = TRUE) &&
    require("igraph", quietly = TRUE) &&
    require("countrycode", quietly = TRUE)) {
  cors_filtered <- cors %>%
    filter(correlation > .6)
  
  continents <- tibble(country = unique(un_votes$country)) %>%
    filter(country %in% cors_filtered$item1 |
             country %in% cors_filtered$item2) %>%
    mutate(continent = countrycode(country, "country.name", "continent"))
  
  set.seed(2017)
  
  cors_filtered %>%
    graph_from_data_frame(vertices = continents) %>%
    ggraph() +
    geom_edge_link(aes(edge_alpha = correlation)) +
    geom_node_point(aes(color = continent), size = 3) +
    geom_node_text(aes(label = name), check_overlap = TRUE, vjust = 1, hjust = 1) +
    theme_void() +
    labs(title = "Network of countries with correlated United Nations votes")
}
```
  
Choosing the threshold for filtering correlations (or other measures of similarity) typically requires some trial and error. Setting too high a threshold will make a graph too sparse, while too low a threshold will make a graph too crowded.