Conditional Sampling of GP Coefficients#

Discovery provides methods to sample GP coefficients conditionally on hyperparameter values. This allows drawing realizations of stochastic processes given their amplitude and spectral properties, which is useful for simulations and diagnostics.

Overview#

In Discovery’s GP framework (see Data Model), a Gaussian process is represented by:

Basis matrix \(F\) (ntoas × ngp)
Coefficients \(a\) (latent variables)
Prior covariance \(\Phi(\Lambda)\) determined by hyperparameters \(\Lambda\)

The marginalized likelihood integrates out the coefficients \(a\). However, we can sample from the conditional distribution:

\[p(a | y, \Lambda) \propto p(y | a) \, p(a | \Lambda)\]

This gives us realizations of the GP coefficients that are consistent with both the data and the hyperparameters.

Conditional Distribution#

Given data \(y\), noise \(N\), basis \(F\), and hyperparameters \(\Lambda\), the conditional posterior for coefficients is Gaussian:

\[a | y, \Lambda \sim \mathcal{N}(\mu_a, \Sigma_a)\]

where:

\[\begin{split}\Sigma_a &= (\Phi^{-1} + F^T N^{-1} F)^{-1} \\ \mu_a &= \Sigma_a F^T N^{-1} y\end{split}\]

The mean \(\mu_a\) represents the maximum a posteriori (MAP) estimate of the coefficients, while \(\Sigma_a\) quantifies the uncertainty given the data and hyperparameters.

Single Pulsar#

The `conditional` Method#

Use the conditional method to get the mean and covariance:

import discovery as ds

# Build likelihood
psr = ds.Pulsar.read_feather('data/v1p1_de440_pint_bipm2019-B1855+09.feather')

signals = [
    psr.residuals,
    ds.makenoise_measurement(psr, noisedict, ecorr=True),
    ds.makegp_timing(psr, svd=True),
    ds.makegp_fourier(psr, ds.powerlaw, 30, name='rn')
]

logl = ds.PulsarLikelihood(signals)

# Sample hyperparameters from priors
params = ds.sample_uniform(logl.params, priordict)

# Get conditional distribution
mu, cf = logl.conditional(params)

Returns:

mu: Mean of conditional distribution (concatenated coefficients for all GPs)
cf: Cholesky factor of covariance matrix \(\Sigma_a\)

The covariance can be reconstructed as \(\Sigma_a = L L^T\) where L = cf[0].

Note: makegp_timing() does not produce coefficients in the conditional distribution (it uses an improper prior by default).

See conditional().

Sampling from the Conditional#

Draw samples from the conditional distribution:

import jax

# Get conditional sampler
sampler = logl.sample_conditional

# Draw a sample
key = jax.random.PRNGKey(42)
key, coefficients = sampler(key, params)

Returns:

key: Updated PRNG key for subsequent sampling
coefficients: Dictionary with coefficient arrays for each GP

Example coefficient dictionary:

{
    'B1855+09_rn_coefficients': array([...]),  # shape (60,) for 30 components
}

See sample_conditional().

Multiple Pulsars (GlobalLikelihood)#

For multi-pulsar analyses, use GlobalLikelihood:

import discovery as ds
import glob

# Load pulsars
files = sorted(glob.glob('data/v1p1_de440_pint_bipm2019-*.feather'))
psrs = [ds.Pulsar.read_feather(f) for f in files]
Tspan = ds.getspan(psrs)

# Build per-pulsar likelihoods with common processes
pulsarlikelihoods = []
for psr in psrs:
    signals = [
        psr.residuals,
        ds.makenoise_measurement(psr, noisedict, ecorr=True),
        ds.makegp_timing(psr, svd=True),
        ds.makegp_fourier(psr, ds.powerlaw, 30, name='rn')
    ]
    pulsarlikelihoods.append(ds.PulsarLikelihood(signals))

# Build global likelihood with correlated GW signal
gbl = ds.GlobalLikelihood(
    pulsarlikelihoods,
    globalgp=ds.makegp_fourier_global(
        psrs, ds.powerlaw, ds.hd_orf, 14, T=Tspan, name='gw'
    )
)

# Sample hyperparameters from priors
params = ds.sample_uniform(gbl.logL.params, priordict)

# Sample conditional coefficients
key = jax.random.PRNGKey(0)
key, coefficients = gbl.sample_conditional(key, params)

Coefficient structure:

For GlobalLikelihood, the coefficients dictionary contains separate entries for each pulsar’s red noise GP and for each pulsar’s contribution to the global GW signal:

{
    'B1855+09_rn_coefficients': array([...]),  # shape (60,) - pulsar 1 red noise
    'B1937+21_rn_coefficients': array([...]),  # shape (60,) - pulsar 2 red noise
    # ... other pulsars' red noise
    'B1855+09_gw_coefficients': array([...]),  # shape (28,) - pulsar 1 GW contribution
    'B1937+21_gw_coefficients': array([...]),  # shape (28,) - pulsar 2 GW contribution
    # ... other pulsars' GW contributions
}

Each pulsar has separate coefficient arrays for its intrinsic processes (red noise) and its contribution to the global correlated signal (GW).

See sample_conditional().

Multiple Samples#

To draw multiple independent samples:

# Draw 100 samples
key = jax.random.PRNGKey(0)
samples = []

for _ in range(100):
    key, coeffs = gbl.sample_conditional(key, params)
    samples.append(coeffs)

Note: Each call updates the PRNG key to ensure independent samples.

Use Cases#

Conditional sampling is used in several analysis techniques:

Outlier identification: Used to identify timing outliers by examining residuals after removing the conditional GP realization. See Wang and Taylor 2021.
Signal reconstruction and posterior predictive tests: Used to reconstruct stochastic signals and perform posterior predictive checks. See NANOGrav 2024.

Limitations#

Deterministic Delays:

Currently, conditional sampling does not support models with deterministic delays:

# This will raise NotImplementedError
signals = [
    psr.residuals,
    ds.makenoise_measurement(psr, noisedict, ecorr=True),
    ds.makegp_timing(psr, svd=True),
    ds.makegp_fourier(psr, ds.powerlaw, 30, name='rn'),
    ds.makedelay(psr, delay_func, name='delay')  # Not supported
]

logl = ds.PulsarLikelihood(signals)
# logl.conditional(params)  # raises NotImplementedError

Workaround: Build a separate likelihood without delays for conditional sampling.