Random effects and {marginaleffects}

Author

Andrew Heiss

Published

November 9, 2025

Notetl;dr
  • Use re.form = NA for conditional effects (typical country)
  • Use re.form = NULL for marginal effects (countries on average)
  • Bootstrap if you need better uncertainty estimates for marginal effects
NoteOther resources

See these too:

In this little example, we’ll look at the effect of (1) legislative gender quotas (binary) and (2) corruption (continuous) on the proportion of women in a country’s legislature. This outcome is measured as a percent, so it should match the same scale as your outcome.

library(tidyverse)
library(lme4)
library(parameters)
library(marginaleffects)
library(modelsummary)

# {vdemdata} isn't on CRAN; follow these instructions to install it: 
#   https://github.com/vdeminstitute/vdemdata
# devtools::install_github("vdeminstitute/vdemdata")
library(vdemdata)

vdem_clean <- vdem |>
  select(
    country_name,
    country_text_id,
    year,
    polyarchy = v2x_polyarchy,
    corruption = v2x_corr,
    civil_liberties = v2x_civlib,
    prop_fem = v2lgfemleg,
    quota = v2lgqugen
  ) |>
  filter(year >= 2000, year < 2024) |>
  drop_na(quota, prop_fem) |>
  mutate(quota = quota > 0, prop_fem = prop_fem / 100)

glimpse(vdem_clean)
Rows: 4,012
Columns: 8
$ country_name    <chr> "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Mexico", "Suriname", "Suriname", "Suriname"…
$ country_text_id <chr> "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "MEX", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "SUR", "…
$ year            <dbl> 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021…
$ polyarchy       <dbl> 0.664, 0.677, 0.688, 0.699, 0.710, 0.710, 0.671, 0.637, 0.642, 0.648, 0.661, 0.660, 0.649, 0.623, 0.620, 0.639, 0.640, 0.635, 0.675, 0.685, 0.663, 0.618, 0.586, 0.549, 0.784, 0.779, 0.779, 0.779, 0.779, 0.776, 0.774, 0.771, 0.772, 0.772, 0.762, 0.753, 0.753, 0.755, 0.755, 0…
$ corruption      <dbl> 0.550, 0.548, 0.573, 0.608, 0.608, 0.608, 0.607, 0.607, 0.607, 0.607, 0.632, 0.632, 0.661, 0.759, 0.759, 0.726, 0.726, 0.686, 0.655, 0.611, 0.598, 0.510, 0.549, 0.533, 0.205, 0.205, 0.205, 0.205, 0.205, 0.205, 0.205, 0.205, 0.205, 0.205, 0.214, 0.214, 0.214, 0.235, 0.235, 0…
$ civil_liberties <dbl> 0.712, 0.713, 0.713, 0.713, 0.713, 0.713, 0.685, 0.696, 0.693, 0.695, 0.684, 0.686, 0.690, 0.681, 0.674, 0.692, 0.689, 0.702, 0.724, 0.712, 0.698, 0.677, 0.666, 0.673, 0.885, 0.883, 0.883, 0.883, 0.883, 0.883, 0.883, 0.883, 0.884, 0.884, 0.882, 0.880, 0.880, 0.877, 0.877, 0…
$ prop_fem        <dbl> 0.1600, 0.1600, 0.1600, 0.2260, 0.2260, 0.2260, 0.2260, 0.2320, 0.2260, 0.2820, 0.2620, 0.2820, 0.3680, 0.3680, 0.3740, 0.4237, 0.4237, 0.4260, 0.4820, 0.4820, 0.4820, 0.5010, 0.5000, 0.5000, 0.1765, 0.1765, 0.1765, 0.1765, 0.1961, 0.2549, 0.2549, 0.2549, 0.2549, 0.2549, 0.…
$ quota           <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…

The models

Here we’ll make two models. Both control for civil liberties and polyarchy (for fun) and include a time trend, and both interact quota and corruption (also for fun; this isn’t necessary and it’s just so that the plots and results have different slopes for different combinations of quota and corruption).

The difference here is that model_basic only has country-specific intercepts while model_fancier has country-specific intercepts and country-specific coefficients for quota. Generally you’d want to have random effects on your main variable(s) of interest.

model_basic <- lmer(
  prop_fem ~ quota * corruption + civil_liberties + 
    polyarchy + year + (1 | country_name),
  data = vdem_clean
)

model_fancier <- lmer(
  prop_fem ~ quota * corruption + civil_liberties + 
    polyarchy + year + (1 + quota | country_name),
  data = vdem_clean
)

Here are the results of the models. Don’t try to read these! Raw coefficients are the worst, especially with interaction terms, and especially with random effects. (The values here are fairly straightforward though; in model 2, a quotaTRUE of 0.123 means that the proportion of women in the legislature is 12.3 percentage points higher, on average, all else equal (and when corruption is 0, since it’s interacted).)

modelsummary(list(model_basic, model_fancier))
(1) (2)
(Intercept) -10.288 -10.241
(0.242) (0.221)
quotaTRUE 0.069 0.123
(0.006) (0.014)
corruption -0.027 -0.015
(0.013) (0.014)
civil_liberties -0.028 0.016
(0.016) (0.016)
polyarchy -0.003 -0.046
(0.016) (0.016)
year 0.005 0.005
(0.000) (0.000)
quotaTRUE × corruption -0.026 -0.119
(0.009) (0.020)
SD (Intercept country_name) 0.097 0.097
SD (quotaTRUE country_name) 0.077
Cor (Intercept~quotaTRUE country_name) -0.263
SD (Observations) 0.045 0.040
Num.Obs. 4007 4007
R2 Marg. 0.180 0.195
R2 Cond. 0.856 0.890
AIC -12645.7 -13312.7
BIC -12589.0 -13243.4
ICC 0.8 0.9
RMSE 0.04 0.04

We can plot the predicted outcome across different variables of interest:

Neato.

Random effect offsets

In those models, we used two different things for random effects grouping:

  • (1 | country_name)
  • (1 + quota | country_name)

The first uses country-specific intercept terms; the second uses country-specific intercept terms and country-specific coefficients for quota. But what does that mean?! This random effects approach is different from good old econometrics-style “fixed effects” where you do lm(y ~ x + country).

Random intercepts only

Imagine a little model like this:

lmer(y ~ x + (1 | country), ...)

In formal mathy terms, for a normal OLS model, we’d write the first part of that model like this:

\[ y = \beta_0 + \beta_1 x_1 \]

In the random effects world, model terms can be thought of as a combination of a global average and a group-specific offset from that average. If we add country-specific intercepts, each country intercept is a combination of a global average (\(\beta_0\)) and a country-specific offset from that average (\(b_{0_j}\)), like this:

\[ \beta_{0_j} = (\beta_0 + b_{0_j}) \]

That offset is assumed to be normally distributed with a mean of 0 (\(\mathcal{N}(0, \sigma_0)\)):

\[ \begin{aligned} Y_{i_j} &\sim \mathcal{N}(\mu_{i_j}, \sigma_y) & \text{Outcome for row}_i \text{ within country}_j \\ \mu_{i_j} &= (\beta_0 + b_{0_j}) + \beta_1 X_{i_j} & \text{Linear model of within-country variation } \\ b_{0_j} &\sim \mathcal{N}(0, \sigma_0) & \text{Random country-specific offsets from global intercept} \end{aligned} \]

You can actually extract those offset values with ranef(). In this case, China, the UAE, Lesotho, and France all basically have the global intercept with no offset; Rwanda has a much higher intercept, while Oman has a much lower intercept.

ranef(model_basic)$country_name |> as_tibble(rownames = "country_name")
# A tibble: 174 × 2
   country_name `(Intercept)`
   <chr>                <dbl>
 1 Afghanistan         0.0556
 2 Albania            -0.0178
 3 Algeria            -0.0528
 4 Angola              0.0776
 5 Argentina           0.147 
 6 Armenia            -0.0778
 7 Australia           0.0960
 8 Austria             0.150 
 9 Azerbaijan         -0.0282
10 Bahrain            -0.107 
# ℹ 164 more rows

And for fun, we can plot the offsets for each country:

all_offsets <- model_basic |> 
  model_parameters(effects = "random", group_level = TRUE)

ggplot(all_offsets, aes(x = Coefficient, y = fct_reorder(Level, Coefficient))) + 
  geom_vline(xintercept = 0) +
  geom_pointrange(aes(xmin = CI_low, xmax = CI_high), size = 0.15) +
  theme_minimal(base_size = 7)

Random slopes and intercepts

All of that ↑ is just for country-specific intercepts, but the same principle applies if we want to have country-specific slopes too. We can think of group-specific offsets (\(b_{n_j}\)) from a global average slope (\(\beta_n\)).

That looks like this in code:

lmer(y ~ x + (1 + x | country), ...)

When working with random slopes, the math notation gets a little fancier because the random intercept and slope terms are actually correlated and move together across groups. The \(\beta\) terms come from a multivariate (or joint) normal distribution with shared covariance.

With this offset approach, we’re estimating the joint distribution of the offsets from the global intercept and slope, or \(\begin{pmatrix} b_{0_j} \\ b_{1_j} \end{pmatrix}\):

\[ \begin{aligned} Y_{i_j} &\sim \mathcal{N}(\mu_{i_j}, \sigma_y) & \text{Outcome for row}_i \text{ within country}_j \\ \mu_{i_j} &= (\beta_0 + b_{0_j}) + (\beta_1 + b_{1_j}) X_{i_j} & \text{Linear model of within-country variation } \\ \left( \begin{array}{c} b_{0_j} \\ b_{1_j} \end{array} \right) &\sim \operatorname{MV}\, \mathcal{N} \left( \left( \begin{array}{c} 0 \\ 0 \\ \end{array} \right) , \, \left( \begin{array}{cc} \sigma^2_{\beta_0} & \rho_{\beta_0, \beta_1}\, \sigma_{\beta_0} \sigma_{\beta_1} \\ \dots & \sigma^2_{\beta_1} \end{array} \right) \right) & \text{Random country-specific offsets from global intercept and slope} \end{aligned} \]

We can see the country-specific intercept and quota offsets for the fancier model:

ranef(model_fancier)$country_name |> as_tibble(rownames = "country_name")
# A tibble: 174 × 3
   country_name `(Intercept)` quotaTRUE
   <chr>                <dbl>     <dbl>
 1 Afghanistan         0.0557   0.0301 
 2 Albania            -0.0676   0.0795 
 3 Algeria            -0.0865   0.0697 
 4 Angola              0.0180   0.101  
 5 Argentina           0.0944   0.0510 
 6 Armenia            -0.0828   0.00880
 7 Australia           0.109   -0.0229 
 8 Austria             0.162   -0.0339 
 9 Azerbaijan         -0.0323   0.00677
10 Bahrain            -0.103    0.0215 
# ℹ 164 more rows

Effects you can calculate

The reason all this is important is because you can deal with those country-specific terms in different ways, depending on the estimand you care about. There are two general categories of estimands you can get from these multilevel, random effects models:

  • Conditional effect = average or typical country
    • Effect of different covariates in an average or typical country (where the random country offset is set to 0)
  • Marginal effect = countries on average, or global effect
    • Average effect of different covariates across all countries where the random country offset is dealt with in various ways

These are different estimands for different questions!

  • Conditional effect: What’s the effect for a typical country?
  • Marginal effect: What’s the effect across countries on average?

Conditional effects

TipConditional effects

Conditional effects are the effect of a variable in a typical group—country, cluster, school, subject, or whatever is in the (1 | group) term in the model.

Mathematically, “typical” means that the random offset \(b_{0_j}\) is set to 0, or that there are no random effects involved.

With lmer()-based objects, use re.form = NA in things like summary(), predict(), and in {marginaleffects} things like predictions(), comparisons(), and slopes().

Average or typical country (conditional effect)

We can look at predicted values of the proportion of women in parliament across different covariates for a typical country:

We can get raw coefficients with summary(model_basic) or summary(model_fancier), but those are harder to interpret, especially with the interaction term since we’d need to manually add the interaction coefficient. We can use avg_comparisons() instead to build those estimates automatically:

We can plot these in a coefficient plot too:

Hypothetical country (conditional effect)

In all those examples ↑ {marginaleffects} automatically holds all covariates at their means or modes. You can also specify your own values if you have reasons for that, like seeing the effect of corruption and quotas in countries with low civil liberties and high civil liberties in a hypothetical country. Anything you don’t specify in datagrid() will be held at its mean or mode, like normal.

We could specify an actual county here like “Egypt”, but since we’re finding conditional effects and disregarding any country-level differences, the predictions will be the same regardless of what country we use. So to make life easier, we can invent a hypothetical country.

Marginal effects

TipMarginal effects

Marginal effects are the global- or population-level effect, or the effect across group clusters on average.

Mathematically, the random offsets \(b_{0_j}\) are incorporated into the estimates somehow, generally by averaging across all the different country-level predictions.

With lmer()-based objects, use re.form = NULL in things like summary(), predict(), and in {marginaleffects} things like predictions(), comparisons(), and slopes().

(re.form = NULL is already the default so technically you don’t have to specify it.)

We can look at predicted values of the proportion of women in parliament across different covariates across all countries, on average, or the global effect:

Here are the predicted values for quota, marginalized across all countries:

And here’s the difference—there’s a 5–6 percentage point increase in the proportion of women in parliament when there are quotas, depending on the model specification

Uncertainty

In all those marginal effects examples, I actually hid a fairly ominous warning:

Warning message:
For this model type, `marginaleffects` only takes into account the uncertainty 
in fixed-effect parameters. This is often appropriate when `re.form=NA`, but may 
be surprising to users who set `re.form=NULL` (default) or to some other value. 
Call `options(marginaleffects_safe = FALSE)` to silence this warning. 

That’s because dealing with the country-specific standard errors of predictions from lmer() models is tricky. All the point estimates that {marginaleffects} provides are correct, but the confidence intervals will likely be too narrow (this isn’t the case with Bayesian models with {brms}).

One way around this is to calculate bootstrapped standard errors. We can make a bunch of new datasets with a randomly sampled set of countries, run the model on each of those new datasets, and then find the confidence interval for the resulting estimates. We can ignore the standard errors and confidence intervals for each of the bootstrapped models since we’ll aggregate the estimates and base our confidence interval on those values.

This is computationally intensive though :(

Note

To speed things up, the code below using the {furrr} package, which allows for automatic parallel processing. Here I’m running these 500 bootstrapped models across 8 CPUs.

If you don’t have {furrr} installed or prefer not to use parallel processing, you can replace future_map() with map() from {purrr}—it will just take longer.

library(rsample)
library(furrr)

# Use 8 CPUs
plan(multisession, workers = 8)

# Make random collections of countries 500 times
bootstrapped_vdem <- vdem_clean |>
  group_by(country_name) |> 
  nest() |> 
  ungroup() |> 
  bootstraps(
    times = 500
  )

# Function for fitting a model on each of the new bootstrapped datasets
fit_predict <- function(.split, ...) {
  .df <- analysis(.split) |> 
    # Assign new unique country IDs (since some will be repeated through the
    # bootstrapping process)
    mutate(country_name = paste0("boot_country_", row_number())) |> 
    # Unnest the country-specific data
    unnest(data)

  # Fit the same model specification as model_fancier
  model <- lmer(
    prop_fem ~ quota * corruption + civil_liberties + 
      polyarchy + year + (1 + quota | country_name),
    data = .df
  )

  # Make predictions
  predictions <- predictions(
    model,
    newdata = datagrid(quota = unique, country_name = unique),
    by = "quota",
    re.form = NULL,
    allow.new.levels = TRUE
  )

  predictions
}

# Run a model on each bootstrapped dataset. This takes a while!
bootstrapped_results <- bootstrapped_vdem |>
  mutate(boot_fits = future_map(splits, fit_predict))

Phew. All those bootstrapped predictions live as separate datasets inside the boot_fits column:

bootstrapped_results
# A tibble: 500 × 3
   splits     id           boot_fits         
   <list>     <chr>        <list>            
 1 <bot_splt> Bootstrap001 <predctns [2 × 9]>
 2 <bot_splt> Bootstrap002 <predctns [2 × 9]>
 3 <bot_splt> Bootstrap003 <predctns [2 × 9]>
 4 <bot_splt> Bootstrap004 <predctns [2 × 9]>
 5 <bot_splt> Bootstrap005 <predctns [2 × 9]>
 6 <bot_splt> Bootstrap006 <predctns [2 × 9]>
 7 <bot_splt> Bootstrap007 <predctns [2 × 9]>
 8 <bot_splt> Bootstrap008 <predctns [2 × 9]>
 9 <bot_splt> Bootstrap009 <predctns [2 × 9]>
10 <bot_splt> Bootstrap010 <predctns [2 × 9]>
# ℹ 490 more rows

We can unnest that column to see all the internal columns:

bootstrapped_results |> 
  unnest(boot_fits)
# A tibble: 1,000 × 11
   splits     id           quota estimate std.error statistic   p.value s.value conf.low conf.high    df
   <list>     <chr>        <lgl>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <dbl>
 1 <bot_splt> Bootstrap001 FALSE    0.168   0.00652      25.8 3.17e-147    487.    0.156     0.181   Inf
 2 <bot_splt> Bootstrap001 TRUE     0.227   0.00897      25.3 3.53e-141    467.    0.209     0.245   Inf
 3 <bot_splt> Bootstrap002 FALSE    0.175   0.00738      23.6 1.26e-123    408.    0.160     0.189   Inf
 4 <bot_splt> Bootstrap002 TRUE     0.251   0.0111       22.6 1.42e-113    375.    0.229     0.272   Inf
 5 <bot_splt> Bootstrap003 FALSE    0.176   0.00686      25.7 1.28e-145    481.    0.163     0.190   Inf
 6 <bot_splt> Bootstrap003 TRUE     0.230   0.00912      25.2 5.49e-140    463.    0.212     0.247   Inf
 7 <bot_splt> Bootstrap004 FALSE    0.173   0.00777      22.2 1.22e-109    362.    0.158     0.188   Inf
 8 <bot_splt> Bootstrap004 TRUE     0.239   0.0107       22.4 1.40e-111    368.    0.218     0.260   Inf
 9 <bot_splt> Bootstrap005 FALSE    0.175   0.00729      24.0 5.43e-127    419.    0.160     0.189   Inf
10 <bot_splt> Bootstrap005 TRUE     0.236   0.00795      29.7 2.19e-193    640.    0.220     0.251   Inf
# ℹ 990 more rows

That estimate column is what we care about! It’s the predicted value for when quota is true or false in each bootstrapped model. We can find its average and 2.5% and 97.5% quantile to get a confidence interval:

boot_summary <- bootstrapped_results |> 
  unnest(boot_fits) |> 
  group_by(quota) |> 
  summarize(
    mean_estimate = mean(estimate),
    ci_lower = quantile(estimate, 0.025),
    ci_upper = quantile(estimate, 0.975)
  )
boot_summary
# A tibble: 2 × 4
  quota mean_estimate ci_lower ci_upper
  <lgl>         <dbl>    <dbl>    <dbl>
1 FALSE         0.174    0.158    0.190
2 TRUE          0.237    0.218    0.260

And here’s how it compares to the non-bootstrapped version:

predictions(
  model_fancier,
  newdata = datagrid(quota = unique, country_name = unique),
  by = "quota",
  re.form = NULL
)

 quota Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
 FALSE    0.174    0.00767 22.7   <0.001 377.0 0.159  0.189
  TRUE    0.238    0.00988 24.1   <0.001 423.0 0.219  0.257

Type: response

In this case, the estimates and confidence intervals are basically the same. For this particular model—with many countries and relatively modest random effect variance—the simpler re.form = NULL approach provides reasonable uncertainty estimates. For models with fewer clusters, larger random effect variance, or more complex random effect structures, bootstrapping might show more of a difference.