Population visualization in R - BGIGPD/BestPractices4Pathogenomics GitHub Wiki

Data preparation

paste <(cut -f 2  A_alb.fam) <(cat A_alb.5.Q | tr ' ' '\t') > A_alb.5.Q.tab

Download data into your work dir.

click data to download files.

set work directory in Rstudio.

setwd you need to use setwd() in R to change the dir.(note: use "/", but not "") of course, you can type ("ctrl" + "shift" + "H")together to switch your work dir fast.

Code Explanation for Admixture Analysis in R

This code provides a step-by-step guide to visualize admixture analysis results on a map of China with pie charts indicating population structure by province, and a structure bar plot for sample clusters.

# Install and load necessary library functions
required_packages <- c("sf", "tidyverse", "ggforce", "mapmixture", "gridExtra")

# Check and install the missing packages
for (package in required_packages) {
  if (!require(package, character.only = TRUE)) {
    install.packages(package, dependencies = TRUE)
    library(package, character.only = TRUE)
  }
}

1. Load Required Libraries

# Load libraries for geographic and data manipulation
library(sf)           # Spatial data handling
library(tidyverse)    # Data manipulation and visualization
library(ggforce)      # Advanced visualizations (used here for pie charts)
library(mapmixture)   # Plotting admixture data
library(gridExtra)    # Arranging multiple plots on one page

2. Load GeoJSON Map Data of China

# Load China's provincial boundaries from a GeoJSON file
china_map_geojson <- st_read("https://geo.datav.aliyun.com/areas_v3/bound/100000_full.json")

This loads the China map data from a GeoJSON source, which is used as the base layer for plotting.

3. Load Admixture Data and Sample-Province Mapping Data

# Load admixture results and province mapping data
A_alb_Q <- read.table("A_alb.5.Q.tab", header = FALSE)
sample_province <- read.csv("sample_province.csv", sep = ",", header = TRUE)
coordinates <- read.csv("china_provinces_lat_lon.csv")

Here, A_alb.5.Q.tab contains admixture cluster results, sample_province.csv links samples to provinces, and china_provinces_lat_lon.csv includes geographic coordinates for each province.

4. Define Column Names for Admixture Data

# Set column names for admixture data
colnames(A_alb_Q) <- c("Ind", paste("Cluster", 1:(ncol(A_alb_Q) - 1), sep = ""))

This assigns column names to the admixture data, naming each cluster as "Cluster1," "Cluster2," etc.

5. Merge Province Information into Admixture Data

# Merge province information to admixture data
admixture1 <- A_alb_Q %>%
  left_join(sample_province, by = "Ind") %>%
  rename(Site = Province) %>%
  select(Site, Ind, everything())

We join the sample and province mapping to the admixture data, renaming the "Province" column as "Site."

6. Calculate Sample Counts per Province and Adjust Size Scale

# Calculate sample count per province and scale the size
site_sample_count <- admixture1 %>%
  group_by(Site) %>%
  summarise(sample_count = n()) %>%
  mutate(pie_size = log1p(sample_count))

For visual scaling, we log-transform the sample count to prevent large differences from distorting the plot.

7. Calculate Mean Population Structure per Province

# Calculate average population structure values by province
admixture_avg <- admixture1 %>%
  group_by(Site) %>%
  summarise(across(starts_with("Cluster"), mean))

This calculates the mean admixture proportions for each cluster across samples within a province.

8. Merge Sample Count Data with Coordinates

# Merge coordinates with sample counts for each province
coordinates_filtered <- coordinates %>%
  filter(Site %in% unique(admixture1$Site)) %>%
  left_join(site_sample_count, by = "Site")

Coordinates are merged with sample counts to map locations accurately.

9. Convert Population Structure to Long Format for Plotting

# Convert population structure to long format for plotting
admixture_long <- admixture_avg %>%
  pivot_longer(cols = starts_with("Cluster"), names_to = "Cluster", values_to = "value") %>%
  left_join(coordinates_filtered, by = "Site")

To plot pie charts, we convert the population structure to long format, with "Cluster" and "value" columns.

10. Plot Map with Pie Charts Showing Admixture Proportions

# Plot map with pie charts indicating population structure
map <- ggplot() +
  geom_sf(data = china_map_geojson, fill = "white", color = "gray") +  # Map of China
  geom_arc_bar(
    data = admixture_long,
    aes(
      x0 = Lon, y0 = Lat, r0 = 0, r = pie_size/2,
      amount = value, fill = Cluster
    ),
    stat = "pie", inherit.aes = FALSE, color = "black"
  ) +
  scale_fill_manual(
    values = c("#FF6B6B", "#4ECDC4", "#556270", "#f1a340", "#998ec3"),
    labels = paste("Ancestry",1:length(unique(admixture_long$Cluster)),sep = "")
  ) +
  coord_sf(xlim = c(73, 135), ylim = c(18, 54), crs = 4326) +
  labs(fill = "Ancestry") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    legend.position = "right"
  )

This block plots China’s map with pie charts representing the population structure proportions per province. Colors indicate different clusters.

pie_map

11. Structure Plot for Admixture Data by Site

# Plot structure plot for admixture data
structure_barplot <- structure_plot(
  admixture_df = admixture1,
  type = "structure",
  cluster_cols = c("#FF6B6B", "#4ECDC4", "#556270","#f1a340","#998ec3"),
  site_dividers = TRUE,
  divider_width = 0.4,
  labels = "site",
  flip_axis = FALSE,
  site_ticks_size = -0.05,
  site_labels_size = 2.2
) +
  theme(
    axis.title.y = element_text(size = 8,hjust = 1),
    axis.text.y = element_text(size = 5)
  )

admixture

The structure plot visualizes individual sample admixture proportions by site, using colors to denote clusters.

12. Arrange Map and Structure Plot

# Combine map and structure plot in a grid layout
grid.arrange(map, structure_barplot, nrow = 2, heights = c(3,1))

grid

This final step combines the map and structure plots, placing them in a 2-row layout with proportions set to emphasize the map.

Reference

The "mapmixture" package tutorial used in this tutorial is as follows mapmixture