distributional-dynamics

High-Frequency Synthetic Distributions of Consumption, Income, and Wealth

Christian Bayer (Bonn, CEPR, IZA, CESifo), Luis Calderon (Bonn), Moritz Kuhn (Mannheim, CEPR, IZA, CESifo)

Quarterly estimates from 1962 to 2024 combining functional data analysis and Bayesian state-space models

Read the Paper GitHub Download Synthetic Data

1962–2024 · Quarterly · PSID, SCF, CEX, SIPP, CPS

Methodology

From Microdata to Synthetic Distributions

We develop a new method for deriving high-frequency synthetic distributions of consumption, income, and wealth. The core of the method is to treat the distributional data as a time series of functions whose underlying factor structure follows a state-space model, estimated using Bayesian techniques. The method incorporates different sources of microdata regardless of their frequency and variable coverage.

Microdata

PSID, SCF, CEX, SIPP, and CPS — each with different frequency and variable coverage

Functional Representation

Quantile functions for marginals plus copula for joint dependence

Dimensionality Reduction

PCA on Legendre polynomial coefficients captures variation in a few factors

State-Space Model

Factor dynamics follow a VAR in companion form with aggregate covariates

Bayesian Estimation

MCMC with Kalman smoother produces posterior draws of the latent factors

Synthetic Distributions

High-frequency quarterly output: full distributions for all 247 quarters

State equation $$\begin{bmatrix} F_t \\ Y_t \end{bmatrix} = \begin{bmatrix} \Phi_{FF} & \Phi_{FY} \\ \Phi_{YF} & \Phi_{YY} \end{bmatrix} \begin{bmatrix} F_{t-1} \\ Y_{t-1} \end{bmatrix} + \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0, \Omega)$$

Observation equation $$\tilde{\theta}^j_t = H^j_t \left( \alpha^j \, \hat{\Gamma}^{\text{MF}} F_t + \nu^j_{F,t} \right)$$

Here \(F_t\) are the latent distributional factors, \(Y_t\) are aggregate covariates, \(\hat{\Gamma}^{\text{MF}}\) is the factor loading matrix from PCA, and \(H^j_t\) is a selector matrix for each dataset \(j\). The model handles mixed frequencies through the time-aggregation parameter \(\alpha^j\).

Data & Replication

Data Sources and Downloads

The synthetic distributional data integrate five major U.S. household surveys, each covering different time periods and variables. C = Consumption, I = Income, W = Wealth.

CPS

1962–2024 (I)

PSID

1968–2021 (C,I,W)

CEX

1984–2023 (C,I)

SCF

1983–2022 (I,W)

SIPP

1984–2022 (I,W)

1962 2024

Download Synthetic Data Replication Code

Loading the Data

using CSV, DataFrames

# Load the consensus functional data
df = CSV.read("consensus_functional_data.csv", DataFrame)

# Extract income shares at the median
income_median = df[!, "sharesincome_0.5"]

# Plot consumption quantiles over time
using Plots
plot(df.time, df[!, "quantilesconsum_0.5"],
     label="Median", xlabel="Quarter")

import pandas as pd

# Load the consensus functional data
df = pd.read_csv("consensus_functional_data.csv")

# Extract income shares at the median
income_median = df["sharesincome_0.5"]

# Plot consumption quantiles over time
import matplotlib.pyplot as plt
plt.plot(df["time"], df["quantilesconsum_0.5"],
         label="Median")
plt.xlabel("Quarter"); plt.legend(); plt.show()

Reference

Citation

Bayer, C., Calderon, L. and Kuhn, M. (2025). “Distributional Dynamics.” CEPR Discussion Paper 19829.

BibTeX

@article{bayer2025distributional,
  author  = {Bayer, Christian and Calderon, Luis and Kuhn, Moritz},
  title   = {Distributional Dynamics},
  year    = {2025},
  note    = {CEPR Discussion Paper 19829},
  url     = {https://github.com/LuisCald/DistributionalDynamics}
}

Distribution Dynamics Over Time