The Wiener-Khinchin theorem provides a clever way of building Gaussian processes for regression. I’ll show you this theorem in action with some Python code and how to use it to model a process.
The Wiener-Khinchin theorem states that an autocovariance function of a weakly stationary process is a function of the power spectral density and vice versa. These two are called Fourier duals and can be written as1: $$ \begin{align} \kappa(h) &= \int S(\omega) e^{2 \pi i \omega h} d \omega \label{1}\tag{1} \\ S(\omega) &= \int \kappa(h) e^{-2 \pi i \omega h} d h \\ \end{align} $$ Where \(\kappa(h)\) is an autocovariance function of a process at lag \(h\) and \(S(\omega)\) is the power spectral density of the same process at frequency \(\omega\). They are Fourier transforms of each other—hence the term Fourier duals.
The idea here is that if you want to model a process, you could estimate its power spectrum (\(S\)), calculate the autocovariance function using equation (\(\ref{1}\)) and construct a Gaussian process.
The problem is that equation (\(\ref{1}\)) is in continuous time and we generally work with discretely sampled time series. Luckily, there is a method known as the periodogram which is a discrete estimate of the power spectral density. SciPy provides this estimate under scipy.signal.periodogram
. Now, equation (\(\ref{1}\)) can be treated discretely as:
$$ \kappa(h) = \sum_{\omega} S(\omega) e^{2 \pi i \omega h} d \omega \label{2}\tag{2} $$
Let’s try this out with some Python code.
Estimating the autocovariance function
Generate a time series made up of two frequencies and some noise:
from numpy import sin, pi, cumsum, linspace
from numpy.random import seed, randn
seed(1)
N = 500 # Number of samples
x = linspace(0, 1, N) # One interval
y = (
sin(x * 2 * pi * 16) # Oscillate 16 times per interval
+ sin(x * 2 * pi * 30) # + 30 times per interval
+ randn(N) # + some noise
)
Which looks like this:
We can get the periodogram with:
from scipy.signal import periodogram
frequency, power = periodogram(y)
The array power
is a noisy power spectrum with two spikes. One spike at \(\frac{16}{N}\) and the other at \(\frac{30}{N}\). It looks like:
We can create a function that uses periodogram
and equation (\(\ref{2}\)) to give us the autocovariance at each \(h\) from \(0\) to \(N\):
import numpy as np
from scipy.signal import periodogram
def spectral_autocov(y):
N = len(y)
frequency, power = periodogram(y, nfft=N)
d = np.mean(np.diff(frequency))
autocov = np.real([
np.sum(np.exp(2 * np.pi * 1j*frequency*h) * power * d)
for h in range(N)
])
return autocov
Running the time series y
through the spectral_autocov
function gives us an array where the \(h^{\text{th}}\) element is the autocovariance at lag \(h\):
autocov = spectral_autocov(y)
Building a Gaussian process
We can use this autocovariance function to construct a covariance matrix and make a prediction with a conditional Gaussian. I’ve talked about the maths of conditional Gaussians in the appendix of a previous article. Knocking this together looks like:
from scipy.linalg import toeplitz
# Forecast this many steps ahead
horizon = 100
# Turn the autocovariance series into a
# covariance matrix plus a noise term
cov = toeplitz(autocov) + np.eye(N) * 20
# Make a forecast using a conditional
# Gaussian process
AA = cov[:(N-horizon), :(N-horizon)]
BA = cov[(N-horizon):, :(N-horizon)]
forecast = BA @ np.linalg.inv(AA) @ y[horizon:]
Which looks like this:
Summary
You can calculate the autocovariance function of a process from its power spectral density. Using this relationship, here we explored estimating the power spectral density of a noisy time series, calculating its autocovariance matrix and then making a prediction using a conditional Gaussian.