What real-life situations is Poisson distribution a good model of?

Poisson random variable is a discrete random variable which lends itself to a wide variety of real-life situations. All these situations have certain identifiable characteristics.

In this article, we will discuss these characteristics and provide examples.

A Poisson random variable with parameter $\lambda$ has a probability mass function defined by $p(k) = \frac{e^{-\lambda}\lambda^k}{k!}$ .

It is proven that a Poisson random variable can be used as an approximation of a binomial random variable in certain cases. It is this fact that makes a Poisson random variable applicable to many real-life situations.

Many scenarios require counting the number of occurrences of something. For example, if you toss a coin $n$ times, we may wonder what the number of tosses showing heads would be. Or, if a state has several flood zones, we may want to estimate how many zones would have floods occurring in them in a given year. Such scenarios can be thought of as containing several sub-experiments, and each sub-experiment can either have an occurrence (success) or no occurrence (failure). For example, in the flood zone example, each flood zone is a sub-experiment, and flood could occur or not occur in each sub-experiment. In these scenarios, a binomial random variable can be used to count the number of occurrences (sub-experiments that succeed). A binomial random variable $B(n,p)$ has parameters $n$ and $p$ , where $n$ denotes the number of sub-experiments and $p$ is the probability of occurrence (success) in each sub-experiment. Again, in the flood zone example, if there are $50$ flood zones, and the probability of flood occurring in a zone is $0.04$ , then random variable $B(50, 0.04)$ counts the number of zones that flood.

It is important to note that the $n$ sub-experiments need to be identical and independent (in this case, the sub-experiments are called trials) for the counting scenario to be modeled by a binomial random variable.

While it is natural to think of many counting scenarios in terms of a binomial random variable, modelling and calculations become unwieldy in scenarios when $n$ becomes large or when the exact value of $n$ is unknown. For example, say, each chapter in a book has approximately $1000$ characters, and we want to estimate the number of characters that are mistyped. Here, $n = 1000$ is a big number, and calculating the probability of having certain number of mistypes in a chapter using the binomial random variable requires unwieldy calculations. In such scenarios, a Poisson random variable can serve as a very useful approximation to the binomial random variable.

A Poisson random variable is a good approximation of a binomial random variable $B(n,p)$ when $n$ is large and $p$ is small. The parameter $\lambda$ of the Poisson random variable can be obtained by $\lambda = np$ . In cases where the exact values of $n$ and $p$ are not known, $\lambda$ can be estimated as the average (expected value) of the number of occurrences.

It is worth noting that while the binomial random variable requires the sub-experiments to be identical and independent, a Poisson random variable can model scenarios where the sub-experiments are not identical or when the sub-experiments are not completely independent.

To illustrate how we can think Poisson for real-life situations, we give some examples:

$1000$ marriages are planned for a certain data at a mass wedding ritual. We want to model the number of couples such that both the partners of the couple have their birthdays on the day of the wedding ritual. This is a binomial distribution scenario where each couple is a sub-experiment and the occurrence (success) is both partners of the couple having the special birthday. Note that couples are independent of each other, and hence the sub-experiments are independent. Here, $n = 1000$ (big), $p = \frac{1}{365^2}$ (small) and they satisfy the requirements of Poisson approximation with parameter $\lambda = np = 1000\times\frac{1}{365^2}$
Say, on average number of cars abandoned on a busy highway is $2$ per week. We want to find the probability of a certain number of cars abandoned in a particular week. We think binomial distribution with each car being a sub-experiment, the occurrence (success) being defined as a car getting abandoned, and each car being independent of the other cars. Here, we do not know what $n$ is, but understandably $n$ is large and since expected value is moderate, $p$ is small. Thus, we can use Poisson distribution with $\lambda = 2$
A person is a waiting for a cab by a street. The person has not called for a cab, so there is no scheduled arrival of a cab. If cabs happen to go by him, he will take it. Let us say on average $3$ cabs go by his street in one hour. Say, we want to model the count of cabs going by his street in one hour. Say, we are interested in the probability that he will have to wait for more than one hour. This is same as the probability that $0$ cabs go by his street in one hour. We will model this scenario to count the number of cabs going by his street in one hour.

Let us divide 1 hour into tiny intervals. We now say that a “pseudo” binomial distribution models this scenario the following way. Each tiny interval is a sub-experiment and occurrence is defined as arrival of a cab in that tiny interval. Each interval is independent because we can assume the system (outside traffic) remains more or less unchanged during that one hour. Cabs going in all directions, people getting on and off, traffic patterns etc., all cause the probability of a cab arriving in one interval unaffected by whether a cab had arrived or not in an earlier interval. Because the intervals are tiny, $n$ is large and because the average is moderate, $p$ is small. Thus, we have a Poisson distribution with $\lambda = 3$ .
A person is a waiting for a cab or the bus by a street. Here, unlike the above example, we assume that the person called for a cab or that the bus is scheduled to arrive at a specific time. Because of variability in arrival times, the person having to wait for a certain time can be considered as a random experiment. However, this experiment can not be modeled as a binomial distribution. This is because the time intervals are not independent in this case. For example, if a cab from the cab company or the bus arrived in an interval, the probability of another cab or another bus arriving clearly diminishes.
A radio active element emits an average of 3 particles per hour. Similar to example 3, this scenario is modeled using a Poisson distribution. Here too, if we divide the time into tiny intervals, we assume the state of the element remains unchanged whether or not it emitted particles in earlier intervals and hence, the intervals are identical and independent.

Here, we will elaborate on some aspects of example 3. Firstly, we called the distribution a “pseudo” binomial distribution rather than binomial distribution. This is because a binomial distribution requires each sub-experiment to have only two outcomes: an occurrence (success) or no occurrence (failure), but in example an interval can have more than one arrival. However, we assume the probability of having more than one cab in an interval goes down drastically as the size of the interval goes down (mathematically, probability that there are two or more cabs in an interval of size $h$ is $o(h)$ ).

Secondly, we modeled the count of number of cabs arriving in $1$ hour by a Poisson random variable with $\lambda = 3$ . Say, we are now interested in modeling the number of cars arriving in $2$ hours. How does the $\lambda$ value change? Since the expected number of cabs arriving in $2$ hours is $3 + 3$ , we use $\lambda = 6$ .