Quarterly estimates from 1962 to 2024 combining functional data analysis and Bayesian state-space models
1962–2024 · Quarterly · PSID, SCF, CEX, SIPP, CPSExplore the synthetic distributional data across consumption, income, and wealth. Select a variable, metric, and toggle individual deciles. Gray bands indicate NBER recession periods.
Source: Bayer, Calderon & Kuhn (2025). PSID implied estimates from the state-space model. Levels in trillions USD; quantiles in USD.
Source: Bayer, Calderon & Kuhn (2025). Bivariate joint distribution of population shares from marginalized CIW copula grid.
We develop a new method for deriving high-frequency synthetic distributions of consumption, income, and wealth. These synthetic data incorporate different sources of microdata, and our method can exploit these sources regardless of their frequency or variable coverage. Core to the method is treating distributional data as a time series of functions, whose underlying factor structure follows a state-space model estimated using Bayesian techniques. The method is generic enough to cover the dynamics of joint distributions in general. Using a wide range of U.S. microdata, we demonstrate that this novel approach yields high-quality, high-frequency distributional data on consumption, income, and wealth that are of interest to modern theories of macroeconomic dynamics.
PSID, SCF, CEX, SIPP, and CPS — each with different frequency and variable coverage
Quantile functions for marginals plus copula for joint dependence
PCA on Legendre polynomial coefficients captures variation in a few factors
Factor dynamics follow a VAR in companion form with aggregate covariates
MCMC with Kalman smoother produces posterior draws of the latent factors
High-frequency quarterly output: full distributions for all 247 quarters
State equation
$$\begin{bmatrix} F_t \\ Y_t \end{bmatrix} = \begin{bmatrix} \Phi_{FF} & \Phi_{FY} \\ \Phi_{YF} & \Phi_{YY} \end{bmatrix} \begin{bmatrix} F_{t-1} \\ Y_{t-1} \end{bmatrix} + \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0, \Omega)$$Observation equation
$$\tilde{\theta}^j_t = H^j_t \left( \alpha^j \, \hat{\Gamma}^{\text{MF}} F_t + \nu^j_{F,t} \right)$$Here \(F_t\) are the latent distributional factors, \(Y_t\) are aggregate covariates, \(\hat{\Gamma}^{\text{MF}}\) is the factor loading matrix from PCA, and \(H^j_t\) is a selector matrix for each dataset \(j\). The model handles mixed frequencies through the time-aggregation parameter \(\alpha^j\).
The synthetic distributional data integrate five major U.S. household surveys, each covering different time periods and variables. C = Consumption, I = Income, W = Wealth.
using CSV, DataFrames # Load the consensus functional data df = CSV.read("consensus_functional_data.csv", DataFrame) # Extract income shares at the median income_median = df[!, "sharesincome_0.5"] # Plot consumption quantiles over time using Plots plot(df.time, df[!, "quantilesconsum_0.5"], label="Median", xlabel="Quarter")
import pandas as pd # Load the consensus functional data df = pd.read_csv("consensus_functional_data.csv") # Extract income shares at the median income_median = df["sharesincome_0.5"] # Plot consumption quantiles over time import matplotlib.pyplot as plt plt.plot(df["time"], df["quantilesconsum_0.5"], label="Median") plt.xlabel("Quarter"); plt.legend(); plt.show()
@article{bayer2025distributional,
author = {Bayer, Christian and Calderon, Luis and Kuhn, Moritz},
title = {Distributional Dynamics},
year = {2025},
note = {CEPR Discussion Paper 19829},
url = {https://github.com/LuisCald/DistributionalDynamics}
}