library(tidyverse)
library(patchwork)
# This is from exercise 7
wnba <- read_csv("https://github.com/andrewheiss/dataviz-projects_07-relationships/raw/refs/heads/main/data/wnba.csv")Visualizing WNBA ages
Distributions
We can plot the distribution of ages with dots—each dot is a player:
plot_dots <- ggplot(wnba, aes(x = age)) +
geom_dotplot(
binwidth = 1,
dotsize = 0.4,
fill = "darkorange"
)
plot_dots
↑ that means that there’s 1 19-year-old player, 2 20-year-old players, 15 25-year-old players, and so on. The dots show the general shape of the ages of WNBA players.
Instead of using dots, we can use a histogram. Here, the bin width is 1, so that the bars show the count of players between 19-20, 20-21, 21-22, and so on. It matches the dotplot:
plot_hist <- ggplot(wnba, aes(x = age)) +
geom_histogram(
binwidth = 1,
boundary = 0,
color = "white",
fill = "darkred"
)
plot_hist
A density plot is basically a histogram, but smoothed out
plot_density <- ggplot(wnba, aes(x = age)) +
geom_density(fill = "dodgerblue", color = NA)
plot_density
It shows that most players are around 25 years old and still matches the general shape of the points and the histogram.
Here are all three combined:
plot_dots / plot_hist / plot_density
or in one plot:
ggplot(wnba, aes(x = age)) +
geom_histogram(
aes(y = after_stat(count)),
binwidth = 1,
color = "white",
fill = "darkred"
) +
geom_dotplot(
aes(y = after_stat(count)),
binwidth = 1,
dotsize = 0.5,
fill = "darkorange"
) +
geom_density(
aes(y = after_stat(density) * nrow(wnba) * 1),
color = "dodgerblue",
linewidth = 2
)
Violin plots
Violin plots are density plots that are mirrored.
Like, here’s a regular density plot:
ggplot(wnba, aes(x = age)) +
geom_density(fill = "dodgerblue", color = NA)
And here’s a violin plot, which is the same thing, but with a second density flipped down at the bottom (I added a horizontal line to show where it’s reflected):
ggplot(wnba, aes(x = age, y = 0)) +
geom_violin(fill = "dodgerblue", color = NA) +
geom_hline(yintercept = 0)
Half distributions
The different half distributions from {gghalves} let you show different types of graphs simultaneously.
Like here’s a half violin (or just a density) with stacked dots:
library(gghalves)
ggplot(wnba, aes(x = 0, y = age)) +
geom_half_violin(side = "r", fill = "dodgerblue", color = NA) +
geom_half_dotplot(
stackdir = "down",
binwidth = 1,
dotsize = 0.3,
fill = "darkorange"
) +
coord_flip()
Or a histogram with stacked dots:
ggplot(wnba, aes(x = 0, y = age)) +
geom_histogram(
aes(y = age),
inherit.aes = FALSE,
binwidth = 1,
color = "white",
fill = "darkred"
) +
geom_half_dotplot(
stackdir = "down",
binwidth = 1,
dotsize = 0.3,
fill = "darkorange"
) +
coord_flip(xlim = c(-35, 25))
Or a half violin with jittered points:
ggplot(wnba, aes(x = 0, y = age)) +
geom_half_violin(side = "r", fill = "dodgerblue", color = NA) +
geom_half_point(side = "l", position = position_jitter(height = 0, width = 0, seed = 1234)) +
coord_flip()