[Probability Review] Top 6 Basic Probability Distributions You Should Know!

18 min readFeb 11, 2023

As a data scientist or machine learning engineer, you should be quite familiar with a few probability distributions. We may often model a real-world occurrence using these probability distributions in real-world circumstances, but it is necessary to understand the complexities of each distribution as well as their applications. In this article, we will go through the six most fundamental probability distributions that everyone should be familiar with.

Bernoulli Distribution

Bernoulli Distribution is a discrete probability distribution that models the outcome of a single binary event, i.e., a trial with only two possible outcomes, commonly referred to as “success” or “failure”. The random variable X follows a Bernoulli distribution and has a probability of 1 (success) and a probability of 0 (failure) (1-p).

PMF, Expectation and Variance

A Bernoulli distribution with parameter p has the following Probability Mass Function (PMF), expectation and variance as:

E(X) = p gives the expected value (mean) of a Bernoulli random variable X with parameter p.

The variance of X is given by Var(X) = p * (1-p).

Because a Bernoulli distribution is a discrete distribution, its PMF can be represented as a bar graph with two bars corresponding to the two potential outcomes (success and failure) and their respective probabilities. Each bar’s height represents the probability of the related outcome.

The probabilities of the two events change as the parameter p changes and more number of trials are conducted, and so does the plot of the PMF.

Usage

Here are a few examples of how a Bernoulli distribution might be used:

A fair coin flip results in either heads (success) or tails (failure), and both outcomes are equally likely, with probability p = 0.5. The Bernoulli distribution can be used to model the outcome of the coin toss in this scenario.
Click-through rate: The Bernoulli distribution can be used to model the probability that a user will click on an advertisement after viewing it. Click-through rate in internet advertising. Let’s imagine there is a 10 percent probability that a user will click on an advertisement. p = 0.1 in this situation.
Medical treatment success/failure: The Bernoulli distribution can be used to model the success or failure of a medical therapy. A Bernoulli distribution, for example, can be used to model the likelihood of a cancer patient surviving following therapy, with p denoting the probability of survival.
Online Advertisements: The probability that a client will make a purchase after browsing an online store can be modeled using the Bernoulli distribution. The probability of a consumer making a purchase, for example, is 0.3, thus p = 0.3.

These are just a few instances of how the Bernoulli distribution might be applied in practice. The Bernoulli distribution is a straightforward but versatile model that can be used in a wide range of subjects and disciplines.

Relation with other distributions

Several probability distributions are connected to the Bernoulli distribution:

Binomial Distribution: If X1, X2,…, Xn are independently and identically distributed (i.i.d) Bernoulli random variables with parameter p, then the sum Y = X1 + X2 +… + Xn has a Binomial distribution with parameters n and p. In n independent Bernoulli trials, the Binomial distribution models the number of successes.
Poisson Distribution: The Poisson distribution is a limiting example of the Binomial distribution in which the number of trials n approaches infinity and the probability of success p approaches 0 while np remains constant. In other words, if the average number of wins in n independent Bernoulli trials is fixed at np, the Binomial distribution approaches a Poisson distribution with parameter np as n grows very big.
Geometric Distribution: The Geometric distribution models the number of Bernoulli trials required to reach the first success. Because each trial is a independent and identically distributed Bernoulli trial, it is related to the Bernoulli distribution.
Negative Binomial Distribution: In a series of independent and identically distributed Bernoulli trials, the Negative Binomial Distribution models the number of failures prior to the r-th success.

Finally, the Bernoulli distribution serves as a foundation for many other significant distributions and is widely utilized in statistical modeling and analysis.

Binomial Distribution

The Binomial Distribution is a discrete probability distribution that models the number of successes in n independent Bernoulli trials, each having the same chance of success p. The Binomial distribution is used to address queries such as, “How many successful outcomes can we expect from a particular number of trials?” Or, “What is the probability of exactly k successful outcomes in n trials?”

PMF, Expectation and Variance

A Binomial distribution with parameters n and p has the following Probability Mass Function (PMF), expectation and variance:

The expectation (mean) suggests that we can expect np successful outcomes in n trials on average. The variance quantifies the distribution’s spread or dispersion.

Plotting the PMF of a Binomial distribution is similar to plotting the PMF of a Bernoulli distribution, except that the Binomial distribution contains several bars corresponding to the various values of X (displayed as small circular markers in the line plots for aesthetic purposes), and the height of each bar represents the probability of that number of successes in n trials. The geometry of the PMF changes as the parameters n and p change. For instance, as the number of trials increases, the distribution becomes increasingly spread out and approaches a normal distribution. As p increases, the distribution shifts to the right, and as success probability p reduces, the distribution shifts to the left.

Binomial Distribution PMF plot by varying the parameters

Finally, the Binomial distribution is a valuable tool for predicting the probability of a particular number of successful outcomes and modeling the number of successful outcomes in a fixed number of trials.

Usage

Here are some scenarios that can be described using a Binomial distribution:

Tossing a coin: Assume you toss a fair coin n times. Let X be the number of heads in n consecutive flips. Because each flip is an independent Bernoulli trial with a success probability of 0.5, X has a Binomial distribution with parameters n and p = 0.5.
Medical treatment success: Suppose you test a new medical treatment on n patients, with X being the number of patients who respond positively to the treatment. Then X has a Binomial distribution with parameters n and p, where p is the treatment’s unknown success probability.
Quality control in a manufacturing process: Assume a manufacturing process produces n items, with X representing the number of defective items in the batch. Then X has a Binomial distribution with parameters n and p, where p is the manufacturing process’s unknown defect probability.
Marketing campaign: Assume a marketing campaign has a goal of reaching n customers, and X is the number of customers who respond to the campaign. Then X has a Binomial distribution with parameters n and p, where p is the consumers’ unknown response rate.

The Binomial distribution can be used to model real-world circumstances in a variety of ways, as seen here. In statistics and data analysis, the Binomial distribution is a versatile and extensively used distribution.

Relation with other distributions

In various ways, the Binomial distribution is related to several other probability distributions. Here are a couple such examples:

Poisson Distribution: The Poisson distribution is a limiting instance of the Binomial distribution that occurs when the number of trials n is very large and the success probability p is very small, so that np remains constant. In other words, if X has a Binomial distribution with parameters n and p, then X converges to a Poisson distribution with parameter = np as n and p become extremely big and very small, respectively.
Normal Distribution: The sum of a large number of independent and identically distributed random variables approaches a normal distribution, according to the Central Limit Theorem. If X1, X2,…, Xn are independent Bernoulli random variables with the same success probability p, then the sum Y = X1 + X2 +… + Xn has a Binomial distribution with n and p as parameters. As n increases, Y approaches a normal distribution, according to the Central Limit Theorem.
Negative Binomial Distribution: The Negative Binomial distribution is a generalization of the Binomial distribution that models the number of failures until the k-th success in a series of independent Bernoulli trials. The Negative Binomial distribution is a discrete distribution similar to the Binomial distribution in that both model the number of successful outcomes in a certain number of trials.
Hypergeometric Distribution: The Hypergeometric distribution is similar to the Binomial distribution in that it models the number of successful outcomes in a set number of trials. Unlike the Binomial distribution, however, the Hypergeometric distribution considers both the overall number of items and the number of items of each kind. The Hypergeometric distribution is used to model scenarios in which trials are not independent and the number of items of each category varies from trial to trial.

These are some of the most prevalent relationships between the Binomial distribution and other probability distributions. Understanding these relationships is critical for selecting the best distribution to model a given circumstance.

Uniform Distribution

The uniform probability distribution is a continuous probability distribution that models a situation in which all conceivable outcomes have the same probability. It is defined by two parameters: a and b, which reflect the distribution’s minimum and maximum values, respectively.

PDF, Expectation and Variance

A uniform distribution’s probability density function (pdf) is defined as follows:

In other words, the PDF of a uniform distribution is a constant value within the interval [a,b] and 0 outside of this interval. The uniform distribution is frequently used to model circumstances in which there is no knowledge about the underlying distribution of the data and all values within the defined range are equally likely.

A uniform distribution’s expected value (mean) is given by its average or the mean of the distribution. A uniform distribution’s variance is always positive, and it reduces as the interval [a,b] shrinks. It is the most commonly used distribution in real world situations, where the shape of the distribution is assumed to remain constant throughout given we do not much information about the underlying distribution of the population. It is often regarded as a good place to start statistical analysis on.

The plot of the PDF of a uniform distribution with different parameters a and b, as shown below, looks rectangular and consistent within any interval described by the parameters a an b.

The purple curve in this plot depicts a uniform distribution with parameters a = 0 and b = 1, the blue curve a uniform distribution with parameters a = 1 and b = 2, the green curve a uniform distribution with parameters a = 2 and b = 4, and the yellow curve a uniform distribution with parameters a = 3 and b = 6. As we can see, the uniform distribution’s shape remains constant, but its range shifts as the parameters a and b change.

Usage

Here are a few examples of when uniform distribution can be used:

Rolling a fair die: All possible outcomes (1, 2, 3, 4, 5, 6) are equally likely when rolling a fair die. In this situation, the uniform distribution with a = 1 and b = 6 can be used to model the probability of each outcome.
Choosing a random number between 0 and 1: If a computer creates a random number between 0 and 1, the uniform distribution with a = 0 and b = 1 can be used to model the probability of each outcome.
Adult male height distribution: If we measured the height of a random sample of adult males, we may anticipate the distribution of heights to be roughly uniform throughout some interval. The uniform distribution might be used to model the probability of each height in this situation, with a and b being the least and greatest observed heights, respectively.
Choosing a random point on a circle: If we were to choose a random point on a circle, we could use the uniform distribution to model the probability of each angle, with a = 0 and b = 360. (degrees).

These are just a few examples of how uniform distribution might be applied. The parameters a and b define the range of the uniform distribution, which models a situation in which all possible outcomes are equally likely.

Relation with other distributions

The uniform distribution is related to other distributions in several ways:

As a building block: The uniform distribution is often used as a building block for more complex distributions. For example, the triangular distribution, which models a situation where the likelihood of an outcome is greatest at a certain point and decreases linearly to either side, is defined as a mixture of two uniform distributions.
As a model for lack of information: As mentioned earlier, the uniform distribution is often used when there is a lack of information about the underlying distribution of the data. In these cases, the uniform distribution is used as a default or prior distribution to capture the uncertainty in the data.
As a model for continuous data: The uniform distribution is a continuous distribution, which means it models data that can take on any value within a specified range. This makes it well-suited for modeling continuous data, such as the height of adult males or the weight of a package.
As a special case of other distributions: The uniform distribution can also be seen as a special case of other distributions. For example, the uniform distribution is a special case of the beta distribution, with the shape parameters equal to 1.

Overall, the uniform distribution is a simple but versatile distribution that has a wide range of applications in probability and statistics. It is related to other distributions in various ways, and is often used as a building block or model for data with a lack of information.

Gaussian (Normal) Distribution

The Gaussian distribution, often known as the normal distribution, is a continuous probability distribution characterized by its probability density function (PDF), which may be calculated as follows:

The Gaussian distribution is bell-shaped and symmetrical, and it models data that is uniformly distributed on both sides of the mean.

The Gaussian distribution PDF plot is a bell-shaped curve, with the mean in the center and the standard deviation dictating the breadth of the curve. The curve becomes wider and flatter as the standard deviation increases, and narrower and higher as the standard deviation falls.

The mean is the distribution’s center and indicates the expected value of the data. The standard deviation is a measure of data spread that reflects how far the data deviates from the mean. When we experiment with different mean (mu) and standard deviation (sigma) values to see how the Gaussian bell curve changes, we see that the form of the curve does not change with a change in the mean, with a fixed standard deviation, but the position of the mean merely shifts. When the standard deviation is changed, the graph dramatically alters. A higher standard deviation implies that the data is dispersed over a larger range, whereas a lower standard deviation suggests that the data is more concentrated around the mean.

The Gaussian distribution is extensively used to model data that is spread out symmetrically about a central value in numerous domains, including engineering, economics, and biology. It is also extensively used as a default or prior distribution in Bayesian statistics, as well as in error and residual modeling in a variety of applications.

Usage

Here are some Gaussian distribution examples:

Adult male height: It is sometimes represented as a Gaussian distribution, with the mean being around 5 feet 10 inches and the standard deviation being approximately 3 inches.
Test grades: Test grades can alternatively be modeled as a Gaussian distribution, with the mean being the average grade and the standard deviation indicating how much the grades deviate from the mean.
Stock Prices: Stock prices can also be described as a Gaussian distribution, with the mean representing the average daily change and the standard deviation indicating how much the daily changes depart from the mean.
Regression Analysis: In regression analysis, the errors or residuals between the expected and actual values are frequently described as a Gaussian distribution, with the mean being zero and the standard deviation being a measure of the prediction’s error or uncertainty.
Human Blood Pressure: Human blood pressure can alternatively be described as a Gaussian distribution, with the mean representing the average blood pressure and the standard deviation indicating how much the blood pressure deviates from the mean.

These are just a few instances of how the Gaussian distribution might be used to model real-world data. The Gaussian distribution is a versatile and adaptable distribution that is frequently utilized in a wide range of applications. Its symmetrical and bell-shaped shape gives it a suitable fit for a wide range of data sources.

Relation with other distributions

In numerous ways, the Gaussian distribution is related to other distributions. Here are a few examples of important relationships:

Central Limit Theorem: The Central Limit Theorem says that, regardless of the distribution of the individual random variables, the total of a large number of independent and identically distributed (i.i.d.) random variables will converge to a Gaussian distribution. This indicates that the Gaussian distribution can be used to approximate the distribution of a large number of random variables.
Gaussian Mixture Models (GMM): Gaussian mixture models are a sort of probabilistic model that consists of a combination of various Gaussian distributions. These models are useful for simulating complex data distributions that can be broken down into numerous Gaussian distributions.
Gaussian Processes (GP): Gaussian processes are a form of probabilistic model that is based on the Gaussian distribution. Gaussian processes are utilized for many different applications, such as regression, classification, probabilistic forecasting, hyper-parameter optimization, machine learning, deep learning, reinforcement learning. It forms the basis of Bayesian analysis and optimization.
Linear regression with Gaussian noise: Linear regression models are used to simulate the relationship between one or more independent variables and a dependent variable. The errors or residuals between expected and actual values are frequently modeled as a Gaussian distribution, with the mean being zero and the standard deviation representing the prediction’s mistake or uncertainty.

These are only a few of the many connections between the Gaussian distribution and other distributions. The Gaussian distribution is a well-known and commonly used distribution that is related to a wide range of other distributions and models in statistics and machine learning.

Poisson Distribution

The Poisson distribution is a discrete probability distribution that models the number of events that occur within a given time or space interval. It is often used to model questions like “How many occurrences of any particular event can be modeled within a given time-frame?”, hence it is widely used in the real world and the industry.

PMF, Expectation and Variance

The probability mass function (PMF), the expectation, and the variance of the Poisson distribution are shown below. It’s important to remember that the expected number of events doesn’t change over time. This means that the average number of events between any two intervals doesn’t change. To back this up, we can also see that the expectation is the same as the data’s variance.

The Poisson distribution has several notable characteristics:

The Poisson distribution’s mean and variance are equal. This suggests that the Poisson distribution is a “memoryless” distribution, in the sense that the average number of future events is independent of the number of prior events.
A limiting instance of the Binomial distribution is the Poisson distribution. The Binomial distribution approaches the Poisson distribution as n approaches infinity and p approaches 0, whereas = np remains constant.

The Poisson distribution is frequently used to model the number of events that occur within a specified span of time or location, such as the number of calls received by a call center, emails received by a user, or cosmic rays detected by a detector.

The plot of the Poisson distribution is affected by the value of the expectation. The Poisson distribution is severely skewed to the left for small values of expectation, but it approaches a symmetrical shape for large values of expectation as denoted by the parameter lambda. The Poisson distribution plot for various values of is presented below:

Usage

The Poisson distribution can be used to model the number of events that occur in a specific interval of time or space given the average number of events per unit of time or space in each of these cases. It could be used to model the number of events in a few different real-world scenarios, such as :

The number of calls a call center receives in a given hour.
The number of customers who enter a store during a specific hour.
The amount of emails a user receives in a given hour.
The number of accidents at a specific intersection in a given year.
The number of defective products in a large shipment of items.

And many more such real world questions.

Relation with other Distributions

Several other distributions are connected to the Poisson distribution, including:

Binomial Distribution: A limiting example of the Binomial distribution is the Poisson distribution. The Binomial distribution approaches the Poisson distribution as n approaches infinity and p approaches 0, whereas = np remains constant.
Exponential Distribution: In a Poisson process, the time between subsequent events is exponentially distributed.
Gaussian Distribution: According to the central limit theorem, the sum of a large number of independent and identically distributed Poisson variables will tend to follow a Gaussian distribution.
Poisson Distribution: The Poisson distribution is a specific case of the Negative Binomial distribution, in which the number of successes is fixed at one and the number of failures is equal to k.
Gamma Distribution: The Poisson distribution is a subset of the Gamma distribution with a shape parameter of one.

Each of these distributions has its own distinct qualities, and the distribution chosen will be determined by the precise requirements of the task at hand. The Poisson distribution, on the other hand, is a good starting point for modeling count data and is frequently employed as a first approximation in many applications.

Exponential Distribution

In a Poisson process, the Exponential Distribution is a continuous probability distribution that models the time between events. The average rate of events is the single quantity that characterizes the exponential distribution.

The exponential distribution’s Probability Density Function (PDF) is given by:

As the rate parameter is increased, the PDF steepens and the time between events shortens. In contrast, as the rate falls, the PDF flattens and the interval between events increases.

Usage

Here are a few instances of scenarios that the exponential distribution can model:

The interval between machine component failures
The amount of time that passes between client arrivals at a store.
The amount of time that passes between radioactive decay events
The interval between calls to a call center
The amount of time that passes between bus arrivals at a bus stop.

The rate parameter can be computed from data and used to model the time between occurrences using the exponential distribution in each of these cases.

Relation with other distributions

In a few ways, the exponential distribution is related to other distributions:

As a variant of the gamma distribution: The exponential distribution is a variant of the gamma distribution with shape parameter k = 1.
As the time between events in a Poisson process: A Poisson process is a discrete-time process in which the exponential distribution models the time between occurrences.
Memoryless property: The exponential distribution has a memoryless property, which means that its future behavior is unaffected by its past behavior. This is in contrast to memory-containing distributions such as the normal or gamma distributions.
As a foundation for other distributions: The exponential distribution can be used to construct other distributions such as the Weibull, extreme value, and log-normal distributions.

These relationships highlight the exponential distribution’s versatility and utility in simulating real-world occurrences.

Conclusion

In conclusion, we took a closer look at each probability distribution and attempted to comprehend why it behaves as it does. We conducted a comprehensive ablation study of all parameters and observed how parameter changes induced shifts in the various distributions. We saw uniform distribution, normal distribution and how they are most commonly used distributions in the real world. We also encountered the memoryless features of Poisson and exponential distributions, which clarified their applicability in simulations of the real world.

We examined the article’s detailed examples and scenarios to gain a thorough knowledge of the many circumstances in which these distributions may be applied. The significance of probability distribution lies in the application of these fundamentals to data science challenges in the actual world. Using sophisticated analytics techniques such as Tableau, Power B.I., etc., it is possible to perform more advanced analytics on the data distributions and generate elaborate interactive visualizations and dashboards.

Congratulations, if you made it this far. Can’t thank you enough!! This is probably the longest article I’ve written on Medium, but I wanted to talk about these distributions in more depth because they are very important and form the basis of any data science. If you want to talk to me or grab some coffee, you can add me on LinkedIn. I’ll accept your request.

Oh, and if you are still not following me on Medium, I would appreciate you if you could follow me on Medium. It really helps me get the motivation to write more such articles on Data Science, Machine Learning and Artificial Intelligence. I have been thinking of writing about linear algebra and matrices. Let me know if you want me to review Linear Algebra.

Author: “Deb”

Hit me up on LinkedIn.

[Probability Review] Top 6 Basic Probability Distributions You Should Know!

Bernoulli Distribution

PMF, Expectation and Variance

Usage

Relation with other distributions

Binomial Distribution

PMF, Expectation and Variance

Usage

Relation with other distributions

Uniform Distribution

PDF, Expectation and Variance

Usage

Relation with other distributions

Gaussian (Normal) Distribution

Usage

Relation with other distributions

Poisson Distribution

PMF, Expectation and Variance

Usage

Relation with other Distributions

Exponential Distribution

Usage

Relation with other distributions

Conclusion

Author: “Deb”

Written by Debanjan Saha