Bayesian Linear Regression

Page content

Download this notebook

This notebook shows the advantage of Bayesian Linear Regression vs. regular Least-Squares-Estimation-based Linear Regression with respect to outliers. The Bayesian part of the approach shown here is that the error’s/noise’s standard deviation $\sigma$ is now not necessarily constant, but a random variable, too, bounded by a lower bound $\sigma_0$ and an upper bound $\infty$.

Standard Linear Regression

Assumption: f(x) = a + bx with measurement $y_i = f(x_i) + \epsilon_i$ with $\epsilon_i \sim \mathcal{N}\left(0,\sigma_0^2\right)$. So a constant noise level $\sigma_0$ (technical term: heteroscedasticity).

If there is an outlier, then the heteroscedasticitiy assumption is no longer valid. This can easily happen in real life, if e.g. students in the lab are not 100% focused after a long day at university and make a mistake in the experiment at the 100th measurement or accidentally type one 0 too many in the Excel spreadsheet. Of course, we can delete these values later from the data - but that is not good scientific practise!

Standard Linear Regression via the minimum least squares estimator (which is equivalent to the maximum likelihood estimator) is solved via the condition

\begin{equation} -\sum_k R_k^2 \overset{!}{=} \underset{a,b}{\max} \end{equation}

with the residual $R=(f(x|a,b)-y)/\sigma_0$

Bayesian Linear Regression

Here, we assume that the original noise level $\sigma_0$ is only the minimum level of noise that we have to expect in the experiment/measurement. E.g., if we use a digital measurement device that can only show integers and not decimals, then we know that “true” values of $10.5$ and $11.4$ would both be shown as $11$, leading to an intrinsic uncertainty of approximately $\pm 0.5 = \pm\sigma_0$. But as we have just discussed previously, there may always be an additional source of uncertainty (like human errors) that can lead to the 100th measurment being disturbed and $\sigma_100 » \sigma_0$. We try to capture this with an a-priori distribution for $\sigma$.

\begin{equation} p(\sigma|\sigma_0) = \frac{\sigma_0}{\sigma^2}\mathbb{1}_{[\sigma_0,\infty)} \end{equation}

For the measure data $D$ and the functional relationship $f$, this leads to the likelihood

\begin{equation} p(D|f,\sigma_0) = \int_0^\infty p(D|f,\sigma)p(\sigma|\sigma_0)d\sigma \end{equation}

which can be expressed with substitution $t=1/\sigma$ as \begin{equation} p(D|f,\sigma_0) = \int_0^{1/\sigma_0} t\exp\left( -t^2 (f-D)^2/2\right)dt \end{equation}

and now solved anaytically. Simplified via the residual $R=(f(x|a,b)-y)/\sigma_0$, this results in

\begin{equation} p(D|f,\sigma_0) = \frac{1}{\sigma_0\sqrt{2\pi}} \frac{1-\exp(-R^2/2)}{R^2}. \end{equation}

The linear parameters $a$ and $b$ are now hidden in the residual $R = R(a,b)$ via the functional form $f$. So only the residual has a direct connection to estimating $a$ and $b$. Hence, if we have $n$ residuals $(R_k)_{1,\dots,n}$ from $n$ measurements, we can sum the log-likelihoods to

\begin{equation} \ln p(a,b|D) = \text{constant} + \sum_k \ln \left(\frac{1-\exp(-R_k^2/2)}{R_k^2} \right)\overset{!}{=} \underset{a,b}{\max}. \end{equation}

Actual Code

import numpy as np
import matplotlib.pyplot as plt
from numpy import random
from scipy.optimize import fmin
random.seed(4321)
# Generate some Data
N = 10 # length
sigmasquare = 2 # sigma_0 = 1 for simplicity in both cases
a = 2.1 # linear parameters
b = 1.2

def f(z): # linear function
    return(b*z+a)

x = random.uniform(0,10,N)
y = f(x)
y = y+random.normal(0,sigmasquare,N)

# first outlier
x[2] = 1.0
y[2] = 15.

# second outlier
x[3] = 8.0
y[3] = -10.

plt.scatter(x,y)
plt.xlabel("x")
plt.ylabel("y")
plt.show()

png

# Standard Least Squared Estimator

# This formulation may be unknown to some of you, 
# but it is one of the many equivalent formulations 
# of the Least Squares Estimator
x_m = sum(x)/N
y_m = sum(y)/N
sxx = sum((x-x_m)**2.0) /(N-1)
syy = sum((y-y_m)**2.0 )/(N-1)
sxy = sum((x-x_m)*(y-y_m)) / (N-1)
b_ls = sxy / sxx
a_ls = y_m - b_ls * x_m
# Now the Bayesian Posterior

def BayesPost(Params): #Params = 2d array Params = (a,b)
    L = 0.0 # log-posterior
    R = 0.0 # k^th residual
    
    for k in range(N):
        R = (Params[1]*x[k]+Params[0]-y[k])/sigmasquare**2.0 # k^th residual
        L = L + np.log( (1-np.e**(-R**2.0 / 2.0)) / R**2.0)
        
    return(-L)
# Comparison



Bayes = fmin(BayesPost, [a_ls, b_ls]) # LSE-guess as initial guess: fmin

print("LSE:")
print(a_ls, b_ls)
print("Bayes:")
print(Bayes[0],Bayes[1])
print("True:")
print(a,b)

xvec = np.arange(0,10,0.01)

plt.scatter(x,y, color = 'k')
plt.plot(xvec, a_ls+b_ls*xvec, '--k', label="Standard")
plt.plot(xvec, Bayes[0]+Bayes[1]*xvec, 'k', label="Bayes")
plt.legend()
plt.show()
Optimization terminated successfully.
         Current function value: 11.876205
         Iterations: 44
         Function evaluations: 83
LSE:
4.387967965679147 0.39364304052875415
Bayes:
2.1090568854164244 1.1515029780327333
True:
2.1 1.2

png

We can see that the Bayes estimation is much closer to the true parameters.