8  Lines, Bars, and Annotation

8.1 Line charts

Line charts are designed for ordered data, especially time. They work best when the x-axis has a meaningful sequence and the line connects observations that should be read as a path.

We will begin with GDP per capita over time in Gapminder.

ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap, group = country)) +
  geom_line(alpha = 0.25) +
  labs(
    title = "GDP per Capita over Time",
    subtitle = "One line per country",
    x = "Year",
    y = "GDP per capita",
    caption = "Source: Gapminder."
  ) +
  theme_minimal()

This plot has too many lines in one panel. We can separate continents with facets.

ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap, group = country)) +
  geom_line(alpha = 0.25) +
  facet_wrap(~ continent) +
  labs(
    title = "GDP per Capita over Time",
    subtitle = "One line per country, faceted by continent",
    x = "Year",
    y = "GDP per capita",
    caption = "Source: Gapminder."
  ) +
  theme_minimal()

Kuwait is an extreme case in this dataset. A log scale makes the vertical comparison easier.

ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap, group = country)) +
  geom_line(color = "gray70", alpha = 0.7) +
  geom_smooth(
    mapping = aes(group = continent),
    method = "loess",
    formula = y ~ x,
    se = FALSE,
    linewidth = 1
  ) +
  scale_y_log10(labels = dollar_format(accuracy = 1)) +
  facet_wrap(~ continent, ncol = 5) +
  labs(
    title = "GDP per Capita on Five Continents",
    subtitle = "Gray lines are countries; smooth lines summarize continent trends",
    x = "Year",
    y = "GDP per capita",
    caption = "Source: Gapminder."
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

8.2 Working with dates

Some datasets store time as actual dates. Others store years as numbers. The lubridate package helps convert text or numeric values into date objects.

make_date(2026)
[1] "2026-01-01"
ymd(c("2009-01-02", "2009 01 03", "Created on 2009 1 6"))
[1] "2009-01-02" "2009-01-03" "2009-01-06"

The built-in economics dataset has monthly dates.

economics |>
  select(date, unemploy, psavert) |>
  slice_head(n = 10)
ggplot(data = economics, mapping = aes(x = date, y = unemploy)) +
  geom_line(color = "steelblue") +
  scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
  labs(
    title = "U.S. Unemployment",
    subtitle = "Monthly observations, 1967-2015",
    x = NULL,
    y = "Unemployed workers (thousands)",
    caption = "Source: ggplot2 economics dataset."
  ) +
  theme_minimal()

For a shorter period, use denser date labels.

economics |>
  filter(date >= "2007-01-01", date <= "2010-12-31") |>
  ggplot(mapping = aes(x = date, y = unemploy)) +
  geom_line(color = "firebrick") +
  scale_x_date(date_breaks = "3 months", date_labels = "%b %Y") +
  labs(
    title = "U.S. Unemployment during the Financial Crisis",
    x = NULL,
    y = "Unemployed workers (thousands)",
    caption = "Source: ggplot2 economics dataset."
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

8.3 Exercise: Dates And Lines

Make a line chart from the built-in economics dataset.

  1. Filter to dates from 2000 through 2015.
  2. Put date on the x-axis and psavert on the y-axis.
  3. Use geom_line().
  4. Use scale_x_date() to show labels every two years.
  5. Add a title and axis labels with labs().
# Write your plot here.

8.5 Bar charts

Bar charts are useful for comparing quantities across categories. When the data is already summarized, use geom_col().

The midwest dataset has county-level observations. To compare average college education by state, we first summarize.

college_by_state <- midwest |>
  filter(!is.na(percollege)) |>
  group_by(state) |>
  summarize(
    avg_college = round(mean(percollege), 1)
  )

college_by_state
ggplot(data = college_by_state, mapping = aes(x = state, y = avg_college)) +
  geom_col(fill = "steelblue") +
  labs(
    title = "Average College Education by Midwest State",
    x = "State",
    y = "Percent with college degree",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

Ordering the bars by value makes comparison easier.

ggplot(data = college_by_state, mapping = aes(x = reorder(state, avg_college), y = avg_college)) +
  geom_col(fill = "steelblue") +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  labs(
    title = "Average College Education by Midwest State",
    subtitle = "Bars ordered by average",
    x = "State",
    y = "Percent with college degree",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

If category labels are long, coord_flip() often helps.

ggplot(data = college_by_state, mapping = aes(x = reorder(state, avg_college), y = avg_college)) +
  geom_col(aes(fill = state), show.legend = FALSE) +
  geom_text(aes(label = str_c(avg_college, "%")), hjust = -0.15, size = 3.5) +
  coord_flip() +
  scale_y_continuous(labels = percent_format(scale = 1), expand = expansion(mult = c(0, 0.12))) +
  labs(
    title = "Average College Education by Midwest State",
    x = NULL,
    y = "Percent with college degree",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

8.6 geom_col() versus geom_bar()

Use geom_col() when you already have the heights of the bars.

Use geom_bar() when you want ggplot2 to count rows.

ggplot(data = midwest, mapping = aes(x = state)) +
  geom_bar(fill = "gray45") +
  labs(
    title = "Number of Counties by State",
    x = "State",
    y = "Number of counties",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

Here, no y variable is mapped. geom_bar() counts the number of rows in each state.

8.7 Bar Label Placement

Text labels on bars need small position adjustments. For vertical bars, vjust moves labels up and down. For horizontal bars made with coord_flip(), hjust moves labels left and right after the coordinates are flipped.

In the ordered geom_col() chart above, hjust = -0.15 places the label just outside the end of each bar after the coordinates are flipped. The expand argument in scale_y_continuous() adds extra room so the labels are not cut off. Use outside labels when exact values matter. If the bars are short or the panel is crowded, labels inside bars or a table may be cleaner.

8.8 Stacked And Proportional Bars

When a second categorical variable is mapped to fill, bars are stacked by default. Stacked bars show both the total height and the composition inside each category.

ggplot(data = midwest, mapping = aes(x = state, fill = factor(inmetro))) +
  geom_bar() +
  scale_fill_manual(
    values = c("0" = "coral", "1" = "navy"),
    labels = c("0" = "Non-metro", "1" = "Metro")
  ) +
  labs(
    title = "County Type by State",
    subtitle = "Stacked bars show counts and composition",
    x = "State",
    y = "Number of counties",
    fill = "County type",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

If the comparison is about composition rather than totals, position = "fill" makes each bar the same height and shows proportions.

ggplot(data = midwest, mapping = aes(x = state, fill = factor(inmetro))) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = percent_format()) +
  scale_fill_manual(
    values = c("0" = "coral", "1" = "navy"),
    labels = c("0" = "Non-metro", "1" = "Metro")
  ) +
  labs(
    title = "Metro Share by State",
    subtitle = "Proportional bars compare composition",
    x = "State",
    y = "Share of counties",
    fill = "County type",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

8.9 Dodged bars

Dodged bars place bars side by side within each category. Here we compare metro and non-metro counties within states.

ggplot(data = midwest, mapping = aes(x = state, fill = factor(inmetro))) +
  geom_bar(position = "dodge") +
  scale_fill_manual(
    values = c("0" = "coral", "1" = "navy"),
    labels = c("0" = "Non-metro", "1" = "Metro")
  ) +
  labs(
    title = "Metro and Non-Metro Counties by State",
    x = "State",
    y = "Number of counties",
    fill = "County type",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

Adding labels to dodged bars requires the text to use the same dodging.

The important detail is that the bars and the text both use position_dodge(width = 0.9). If the widths differ, the labels will not line up with the bars. The group = factor(inmetro) mapping tells the text layer which bars belong side by side.

ggplot(data = midwest, mapping = aes(x = state, fill = factor(inmetro))) +
  geom_bar(position = position_dodge(width = 0.9)) +
  geom_text(
    mapping = aes(label = after_stat(count), group = factor(inmetro)),
    stat = "count",
    position = position_dodge(width = 0.9),
    vjust = -0.4,
    size = 3
  ) +
  scale_fill_manual(
    values = c("0" = "coral", "1" = "navy"),
    labels = c("0" = "Non-metro", "1" = "Metro")
  ) +
  labs(
    title = "Metro and Non-Metro Counties by State",
    x = "State",
    y = "Number of counties",
    fill = "County type",
    caption = "Source: ggplot2 midwest dataset."
  ) +
  theme_minimal()

8.10 Bar chart example: OECD life expectancy gap

A local copy of the oecd_sum summary lives at Data/oecd/oecd_sum.csv. Each row is one year, with columns for U.S. life expectancy, the OECD non-U.S. average, the difference, and a hi_lo indicator for whether the U.S. is above or below the average.

oecd_sum <- read_csv("Data/oecd/oecd_sum.csv", show_col_types = FALSE)

ggplot(data = oecd_sum, mapping = aes(x = year, y = diff, fill = hi_lo)) +
  geom_col() +
  guides(fill = "none") +
  labs(
    title = "The U.S. Life Expectancy Gap",
    subtitle = "Difference between U.S. and OECD average life expectancy",
    x = "Year",
    y = "Difference in years",
    caption = "Data: OECD, summarized in Kieran Healy's socviz package."
  ) +
  theme_minimal()

This is a bar chart because the main comparison is the size and direction of a yearly difference.

8.11 Exercise

Create an ordered bar chart from a summarized dataset.

  1. Use midwest.
  2. Group by state.
  3. Calculate the median of percollege.
  4. Make an ordered geom_col() chart.
  5. Add value labels.
  6. Remove any redundant legend.
# Write your plot here.

8.12 Extra: slopegraphs

Slopegraphs compare the same units at a small number of time points. They are especially useful when the start and end values matter more than the intermediate path. The form was popularized by Edward Tufte in The Visual Display of Quantitative Information, but the design predates him by a century.

A generic Tufte-style slopegraph:

Tufte slopegraph illustration

Ben Fry’s slopegraph of baseball salary per win:

Baseball slopegraph by Ben Fry

And one from Scribner’s Statistical Atlas of the United States in 1883, well before the modern term existed:

Scribner’s Statistical Atlas, 1883

Unlike a traditional line chart, a slopegraph emphasizes the start and end values of each unit, so change is read as the slope of a line rather than as a path through many points.

The states table is a local snapshot of Correlates of State Policy data, originally fetched with cspp::get_cspp_data(). polconserv is policy conservatism (the negation of pollib_median), so larger values mean a more conservative policy environment.

states <- read_csv("Data/state_policy/cspp_states.csv", show_col_types = FALSE)

library(ggslopegraph)

ggslopegraph(
  dataframe = states |>
    mutate(year = as.character(year)) |>
    filter(year %in% c("1980", "1990", "2000", "2010"), region.name == "midwest"),
  Times = year,
  Measurement = polconserv,
  Grouping = st,
  Title = "Tufte-style Slopegraph of Policy Conservatism",
  SubTitle = "Midwest, 1980-2010",
  Caption = "Source: Correlates of State Policy via the cspp package."
)

ggslopegraph() requires the Times column to be a character or factor, which is why year is converted before filtering. If ggslopegraph is not available, the same plot can be built with plain geom_line() and geom_text().