Particle Filtering Tutorial
This tutorial is two-fold: the first part gives a more gentle introduction about Bayesian estimation, and the second part dives into the specifics of the particle filtering algorithm, which is the Monte Carlo method of performing Bayesian estimation.
PART 1 — Bayesian Estimation
Have you ever stopped to wonder: how do scientists predict the weather? The weather, among other things, is a really random process that seems like it would be impossible to predict in the future. What mathematical tools allows scientists to do this seemingly impossible task?
Scientists could just use a mathematical model that is derived from the laws of physics (i.e. see below):
For weather, you could imagine this to be some physics equation that models how the state of the system will change over time. A scientist trying to predict the weather wants to estimate the state of the system described by the mathematical model above, where the state of the system xt in this model might be the temperature or the likelihood of precipitation at a given location.
BUT, our mathematical models have some randomness (in the equation above, the nt) that captures the uncertainty in the model. This uncertainty makes it difficult to estimate (or predict) the state of a physical system over time.
Because of this uncertainty, if scientists only used a mathematical model derived from the laws of physics to estimate (and predict) the weather, the weather forecasts would be really terrible! This is why scientists ALSO collect all the measurements (or data) they can about the weather. Weather measurements come from sensors like barometers, thermometers, humidity sensors, pressure sensors and more.
With this additional information, scientists can combine their mathematical model with the data they collect, to estimate the state of a physical process like the weather — and even make accurate predictions of it for up to 7 days.
In this article, we will describe one of the many mathematical frameworks (Kalman filters, Extended Kalman Filters, Unscented Kalman Filters, etc.) that can be used to estimate the state of a random process when given a 1) mathematical model of the process with some additive uncertainty and 2) some measurements (that also have uncertainty) of the system as it is evolving over time. These techniques are generally referred to as Bayesian Estimation.
Particle filtering, in particular, is a technique where we use Monte Carlo methods to iteratively estimate the state of the system at every time step!
Before we jump ahead to the math, let’s think about some other domains where this math might be useful. Imagine any random process that has a lot of uncertainty, and therefore seems very difficult to estimate or predict. But also imagine that you can take some measurements that give you some information about the state of your system. Can you stop to think about where else we could use Bayesian estimation? Here are a just few that I could come up with:
Bayesian filtering is the umbrella term for how you can combine a mathematical model with measurements to make estimates of the state of a system. Just keep in mind that particle filtering is just one of many methods to do Bayesian filtering.
THE MATHEMATICAL MODEL
Now, let us dig into the math! Let’s say we have a nonlinear dynamics model that captures the evolution of the dynamics of a system as shown below:
The nonlinear model above is an equation that predicts how we think the system will evolve over time.
Let me introduce an example that will make these equations easier to interpret and understand. Let us consider a car that is moving with constant velocity (zero acceleration) in a straight line (1-D example) The mathematical model for this car will capture how the state of the car (its position x and its velocity v) will evolve over time. Using the laws of physics, we can write how the state of the car (its position x and velocity v) we evolve over time. In particular, the discrete mathematical model looks as follows:
Note, since the car might experience some unmodeled physical perturbations like the slipperyness of the road or some wind, there is some inherent uncertainty in the process! This uncertainty is modeled to be random white noise and is captured by the Gaussian random variables n and w, which are modeled to be identically and independently distributed (i.i.d.).
THE MEASUREMENT MODEL
Now that we have covered the mathematical model, we now want to figure out how the measurements that we take reflect the state of our system. Nonlinear measurements have the following form:
Let’s see what measurements might look like in our original car example! Imagine we had a GPS that could measure the car’s position. The measurement equation is an equation that captures how the GPS position is a function of the system’s state. The measurement equation for a GPS reading can be written as:
In other words, we can think of our GPS measurement as an indicator of the position of the car at a given time with some additive noise. Another example of a measurement (perhaps using vision or some other range sensor) could be the distance to an object with a known location. If the object’s position is given by px, we can write a range measurement as:
Again, since sensor measurements are not perfect, there is noise in our sensor measurement model as well. We use a Gaussian random variable (in this case n) to capture this randomness in our model.
Now, let us take a look at what this car example looks like graphically. In the figure below, we can see if the model of the physical process was entirely deterministic, we would already know the exact position of the car, and we would not need to use particle filtering (as shown in the top image of the figure). The orange represents the location of the car at every time-step and the green represents the GPS measurement taken at each time step.
Since the physical process has some uncertainty, however, there is a range of values that the car position (and velocity) could be at (as shown by the ellipsoid region in the diagram). Similarly, there is uncertainty in the measurements, which is indicated by the lighter green region that indicates that the measurement can be in a range of positions, as well. Since it is uncertain where the car is actually located at each given time-step, we will derive an algorithm that will tell us our best estimate of the state at each time step of the process given all the measurements we have taken up to that time.
In other words, we want to find the following probability distribution: p(xt|y0:t)
Before deriving equations that show how we get this posterior distribution, let’s look at one more graphical representation of what we are computing.
Here, the xt is the state of the system and the yt is the corresponding measurement taken at that time. So the question we want to ask in Bayesian filtering is: at each time t, how do we find the posterior distribution p(xt|y>0:t? In other words, what is the probability distribution over the current state, given the measurements that we have collected from time 0 up to time t.
We use Bayes’ theorem to answer this question! Bayes’ rule says the following:
Bayes’ theorem tells us how to mathematically solve for a probability distribution over the system’s state at time t: p(xt|y0:t). It is saying this posterior probability distribution depends on two probabilities: 1) the prior distribution of the state and 2) the measurement likelihood function. Note, the normalization term in the denominator is necessary since probabilities must take on values between 0 and 1. In the next sections, I will describe the intuition of each of the pieces of the equation on the right-hand side of the equation, and their connection to the mathematical model and the measurement model that we described earlier.
1) THE PRIOR DISTRIBUTION
The prior distribution is the probability distribution of the state at xt given all past measurements up to time t-1: (p(xt|y0:t-1). This distribution can be derived using our mathematical model of our system. In particular, the prior distribution tells us where we think the state of the system will be if it evolves according to the mathematical model.
Let’s see what this means in the particle filtering setting where the distributions are represented by samples. Imagine we have samples of the previous posterior distribution: p(xt-1|y0:t-1). Therefore, if we took each of those samples from that distribution and propagated the state forward by one time-step according to the mathematical model, then these new samples would represent a new distribution. In particular, this new distribution would represent the prior distribution: p(xt|y0:t-1). This propagation of samples would result in a distribution where the mean of the prior distribution is centered around the prediction of the state by the model f(xt-1}) and its variance is given by SIGMA. This can be seen in the figure below:
2) THE LIKELIHOOD FUNCTION
While the prior distribution is closely associated to the mathematical model, the likelihood function (as shown on the right) can be derived from the measurement equations that we introduced earlier.
In particular, the likelihood function says if you are at state xt, it will give you the likelihood of observing the measurement yt. Let’s think back to the car example to get a more intuitive understanding of what this means. If our car is at a position xt = 5 m, then the likelihood p(xt|y0:t) will tell us the probability of a given measurement ytgiven that the car’s position is at xt = 5 m. We can imagine that a measurement yt close to 5 m will have a much higher probability than a state far away from 5 m, since the GPS is just measuring the position of the car. The mean of this distribution is given by h(xt) and the covariance is given by Gamma, as can be seen in the figure above.
PART 2 — Particle Filtering
Particle filtering is a specific Monte-Carlo method to iteratively derive the posterior distribution using the likelihood and prior distributions! Now that we are familiar with the intuition behind the likelihood and prior distribution, let’s dive into the particle filtering algorithm itself.
If we could sample directly from the posterior distribution (p(xt|y0:t), given by the blue distribution below, we would already be done! We could simply use Monte-Carlo sampling to approximate the distribution p(xt|y0:t).
Unfortunately, we do not have the ability to directly sample from this posterior distribution. The distribution we can sample from, however, is the prior distribution. Luckily, we can use a mathematical technique called importance sampling to approximate the posterior distribution that we are actually looking for by using samples from the prior distribution!
IMPORTANCE SAMPLING (IS)
Importance sampling is a mathematical tool we can use so that samples from the prior distribution can be used to approximate the posterior distribution.
Remember Bayes’ theorem? Bayes’ theorem tells us the relationship between the prior distribution and the posterior distribution. Once we sample from the prior distribution, we can reweigh the samples (particles) according to the likelihood function. After some normalization, the reweighted samples from the prior distribution represent the posterior distribution. Note, samples and particles are meant to be equivalent and are used interchangeably in the rest of this tutorial.
THE PARTICLE FILTERING ALGORITHM!
Now, let’s dive into the details of how to do this in practice! Suppose we have some initial Gaussian distribution approximating our estimate of the initial state of our system at time t=0. In the first step of the algorithm, we sample n particles from this initial distribution. Note, these particles are a bunch of different x-positions on the car representing potential locations that the car is actually at. (Aside from the first step of this iterative algorithm, these samples will actually come from a non-Gaussian posterior distribution (p(xt-1|y0:t-1) instead of p(x0|y0).
Now that we have these samples, we want to propagate them forward according to our mathematical model of the physical process. Of course, we propagate the particles forward with the noise in the mathematical model as well! Once we propagate these particles forward, we get a new distribution. This distribution represents the prior distribution p(xt|y0:t-1). The red x in the diagram corresponds to the most probable position of the vehicle according to the prior distribution.
Now, that we have samples from the prior distribution, we want to use our information we gain from having a measurement at time t=1 to reshape the weights on the samples. For the third step of the algorithm, we compute the weights of each particle. We can see the formula for deriving the weights with the equation given for finding wt+1n. In this case, we can see that particles whose mapping h(xt+1) matches the measurement well will have significantly higher weights. Of course, we must normalize the probability distribution, which we can do by normalizing the weights.
Now that we have reweighed the particles according to the likelihood function, we define our posterior distribution as follows:
We can see that our initial prior estimate without considering the measurements got shifted to the right after we added in our measurement information! This shows how the predicted estimate from our mathematical model is readjusted to accommodate for the measurements that we are simultaneously collecting! Remember this is an iterative process.
However, there is a main issue that we must address before iterating! The main issue has been termed particle degeneration.
There is some math (which I will not go into details into) which shows that after some time, all particles except one will have negligible weight! So, instead of getting a great approximation of the posterior distribution, you get a distribution that looks like the green distribution shown above. Luckily, there are many ways to remedy particle degeneration. One way is to do a resampling step as shown below:
In this case, we have some particles with large weights. If we resample from this distribution, we get a new set of samples where each of them has the same weight.
Resampling in this way is just a way to represent a distribution with samples such that instead of having varying weights (where some may be very large and some may be very small), they all have the same weight.
This may not be the most effective way to avoid particle degeneration, but it is one of the techniques that is the simplest and easiest to implement! After this resampling process, we now have a set of samples that represent the posterior distribution (p(xt|y0:t). Now, we can repeat the process in order to estimate the next state.
SUMMARY OF THE ALGORITHM
In summary, in particle filtering, we are trying to identify the best estimate of the state of an uncertain or stochastic system as it is evolving over time. We do this by taking into account the mathematical model we have of our system and the measurements we are taking of the system as it is evolving over time.
Particle filtering allows us to iteratively estimate the state by doing Importance Sampling (i.e. taking samples from the prior distribution and reweighing the particles according to the new measurements taken at the new time-step). Note, there are some nuances when it comes to predicting the model forward a few steps (like the weather), but the mathematical tools used are very similar in flavor.
Note, there are plenty more resources to help you to understand this process more in depth! Here is a thorough tutorial that goes through the mathematical details: here.