Quarterly estimates from 1962 to 2024 combining functional data analysis and Bayesian state-space models
1962–2024 · Quarterly · PSID, SCF, CEX, SIPP, CPSExplore the synthetic distributional data across consumption, income, and wealth. Select a variable, metric, and toggle individual deciles. Gray bands indicate NBER recession periods.
Source: Bayer, Calderon & Kuhn (2025). PSID implied estimates from the state-space model. Levels in trillions USD; quantiles in USD.
Source: Bayer, Calderon & Kuhn (2025). Bivariate joint distribution of population shares from marginalized CIW copula grid.
We develop a new method for deriving high-frequency synthetic distributions of consumption, income, and wealth. The core of the method is to treat the distributional data as a time series of functions whose underlying factor structure follows a state-space model, estimated using Bayesian techniques. The method incorporates different sources of microdata regardless of their frequency and variable coverage.
PSID, SCF, CEX, SIPP, and CPS — each with different frequency and variable coverage
Quantile functions for marginals plus copula for joint dependence
PCA on Legendre polynomial coefficients captures variation in a few factors
Factor dynamics follow a VAR in companion form with aggregate covariates
MCMC with Kalman smoother produces posterior draws of the latent factors
High-frequency quarterly output: full distributions for all 247 quarters
State equation
$$\begin{bmatrix} F_t \\ Y_t \end{bmatrix} = \begin{bmatrix} \Phi_{FF} & \Phi_{FY} \\ \Phi_{YF} & \Phi_{YY} \end{bmatrix} \begin{bmatrix} F_{t-1} \\ Y_{t-1} \end{bmatrix} + \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0, \Omega)$$Observation equation
$$\tilde{\theta}^j_t = H^j_t \left( \alpha^j \, \hat{\Gamma}^{\text{MF}} F_t + \nu^j_{F,t} \right)$$Here \(F_t\) are the latent distributional factors, \(Y_t\) are aggregate covariates, \(\hat{\Gamma}^{\text{MF}}\) is the factor loading matrix from PCA, and \(H^j_t\) is a selector matrix for each dataset \(j\). The model handles mixed frequencies through the time-aggregation parameter \(\alpha^j\).
The synthetic distributional data integrate five major U.S. household surveys, each covering different time periods and variables. C = Consumption, I = Income, W = Wealth.
using CSV, DataFrames # Load the consensus functional data df = CSV.read("consensus_functional_data.csv", DataFrame) # Extract income shares at the median income_median = df[!, "sharesincome_0.5"] # Plot consumption quantiles over time using Plots plot(df.time, df[!, "quantilesconsum_0.5"], label="Median", xlabel="Quarter")
import pandas as pd # Load the consensus functional data df = pd.read_csv("consensus_functional_data.csv") # Extract income shares at the median income_median = df["sharesincome_0.5"] # Plot consumption quantiles over time import matplotlib.pyplot as plt plt.plot(df["time"], df["quantilesconsum_0.5"], label="Median") plt.xlabel("Quarter"); plt.legend(); plt.show()
@article{bayer2025distributional,
author = {Bayer, Christian and Calderon, Luis and Kuhn, Moritz},
title = {Distributional Dynamics},
year = {2025},
note = {CEPR Discussion Paper 19829},
url = {https://github.com/LuisCald/DistributionalDynamics}
}