Probability Density Functions Overview - NAVADMC/ADSM GitHub Wiki

Probability density functions supported by ADSM

Probability density functions are distributions of values representative of the natural range of possible values for some parameter. Values are drawn stochastically from these distributions as a simulation runs.

ADSM supports 22 general pdf types of probability density functions, described in the following sections, along with the parameter order expected by ADSM. Some distributions are more suitable to some applications than others, but all are provided to ensure maximum flexibility to model users. References are provided for users who wish to obtain more detailed information about these distributions; in particular, Vose (2000) provides a very helpful discussion regarding suitable applications of the different types of probability density functions. A discussion of distribution fitting is beyond the scope of this guide, but readers are referred to other sources (Law, 2006; Vose, 2000).

Most of the distributions described here are continuous. Recall, though, that ADSM operates in discrete time steps of one day. Consequently, for parameters that have x axis units of days, values obtained from these distributions will be rounded to the nearest whole day.


The Beta distribution

A Beta distribution (figure A-1) is a continuous distribution defined by four parameters: α1, α2, a minimum value, and a maximum value. Parameters α1 and α2 must be greater than 0, and the minimum value must be less than the maximum value. The probability density f(x) is calculated as:

0-1


The BetaPERT distribution

The BetaPERT distribution (Figure 0 -2) is a continuous distribution defined by its minimum, its most likely value (mode), and its maximum. In this way, the BetaPERT distribution is similar to the Triangular distribution.

The BetaPERT distribution is related to the Beta distribution; the α1 and α2 parameters used to define a Beta distribution are obtained from the minimum, mode, and maximum values of a BetaPERT. There are several slight variations in how α1 and α2 are calculated from the minimum, mode, and maximum. ADSM uses the same approach as the risk analysis package @RISK (Palisade Corporation, 2008). In this approach, α1 and α2 are calculated with the following formulas:

0-2


The Bernoulli distribution

The Bernoulli distribution is a discrete distribution used to model whether an event will occur. A single parameter p represents the probability of the event. For example, the occurrence of getting a head on one toss of a coin could be modeled with the distribution Bernoulli (0.5), or the occurrence of getting a 6 on a single roll of die could be modeled with Bernoulli (0.1667). The outcome of a Bernoulli trial is always 0 or 1. The Bernoulli distribution is a special case of the Binomial distribution, in which the number of trials is 1.

0-3


The Binomial distribution

The Binomial distribution is a discrete distribution used most often to model the number of x successes from n independent trials in which there is a probability p of success in each trial (Figure 0 -4). It is used when there are exactly two mutually exclusive outcomes of a trial. The probability density function f(x) is calculated as:

0-4


The Discreet Uniform distribution

The Discreet Uniform distribution, sometimes called the “equally likely outcomes” distribution, has a set of n elements in which each element i has the same probability of occurring (Figure 0 -5). A simple example of this distribution is the outcome of throwing one die. The probability density function f(x) is calculated as:

0-5


As noted above, ADSM sometimes rounds values from continuous distributions to the nearest integer value, when they are applied to discrete values like the duration in days of a disease state. For parameters like these, the Discrete Uniform distribution might be a better choice than the Uniform distribution in preventing excessive errors due to rounding.

The Exponential distribution

An Exponential distribution (Figure 0 -6) is a continuous, highly skewed distribution defined by its mean μ, which must be greater than 0. The probability density function f(x) is calculated as:

0-6


The Fixed Value “distribution” Fixed values are not distributions but are used by ADSM in much the same way. A Fixed values “distribution” is defined by a fixed value y. The probability density function f(x) for a Fixed values “distribution” always returns this fixed value: f(x) = y

The Gamma distribution A Gamma distribution (Figure 0 -7) is continuous and defined by two parameters: its shape (α) and its scale (β), in which α and β must be greater than 0. Gamma distributions can take a wide variety of shapes, depending on the values of the shapes and scale parameters. This distribution has a mean of α×β. The probability density function f(x) is calculated as:

0-7


The Gaussian (Normal) distribution

A Gaussian or Normal distribution (Figure 0 -8) is a continuous, bell-shaped curve described by two parameters: its mean μ and its standard deviation σ. The standard deviation must be greater than 0. The Gaussian distribution is inherently symmetric. The probability density function f(x) is calculated as:

0-8


The Histogram distribution

The Histogram distribution (Figure 0 -9) is a continuous empirical distribution. A histogram distribution directly makes use of data to define its shape and properties, rather than mathematically defining a formula and its parameters. The probability density function f(x) is calculated from the set of minimum and maximum values for each histogram bin as follows:

0-9


The Hypergeometric distribution

A Hypergeometric distribution (Figure 0 -10) is a discrete distribution commonly used to estimate the number of items of type X in sample n when the sample is drawn from population M that has D items of type X. For example, the number of infected animals in a shipment of animals selected at random from a herd of a particular size with a known prevalence of disease could be modeled using a hypergeometric distribution. In this example, n is the number animals in the shipment, M is the herd size, and D is the herd size times the prevalence. The probability density function f(x) for a Hypergeometric distribution is calculated as:

0-10


The Inverse Gaussian distribution

An Inverse Gaussian distribution (Figure 0 -11) is a continuous distribution characterized by two parameters: a mean (µ) and a scaling factor (λ). This distribution has been used in epidemiology to model mean infectious and latent periods. The probability density function f(x) is calculated as:

0-11


The Logistic distribution

A Logistic function (Figure 0 -12) is a continuous distribution defined by two parameters: its location α and scale β. The scale parameter must be greater than 0. The probability density function f(x) is calculated as:

0-12


The Loglogistic distribution

The Loglogistic function (Figure 0 -13) is a continuous distribution defined by three parameters: its shape α, scale β, and location γ. Scale and shape must be greater than 0. The probability density function f(x) is calculated as:

0-13


The Lognormal distribution

The Lognormal distribution (Figure 0 -14) is a logarithmic transformation of the normal distribution. It is described using the same parameters as the normal distribution: its mean μ and its standard deviation σ, where both μ and σ are greater than 0.

The Lognormal distribution, which is continuous, is extremely asymmetric (skewed to the right) when the mean is close to 0; the further the mean is from 0, the more the Lognormal distribution approaches the symmetry and shape of a Normal distribution. The probability density function f(x) for a Lognormal distribution is calculated as:

0-14


A Lognormal distribution may also be defined by zeta(ζ) and σ’, as shown previously. ADSM supports both sets of parameters for Lognormal distributions, and automatically handles the conversion of μ and σ to ζ and σ’.

The Negative Binomial distribution

A Negative Binomial distribution (Figure 0 -15) is a continuous distribution often used to estimate the number of failures that will occur before there are s successes in which there is a probability p of success in each trial. The parameters s and p are specified. The probability density function f(x) is calculated as:

0-15


The Pareto distribution

A Pareto distribution (Figure 0 -16) is a power-law-type probability distribution. A power-law implies that small occurrences are extremely common, whereas large instances are extremely rare. This distribution is heavily skewed to the right and has a mode and minimum that are equal. The distribution starts at the mode a and has a rate of decrease determined by the parameter θ. The probability density function f(x) is calculated as:

0-16


The Pearson 5 distribution

The Pearson 5 distribution (Figure 0 -17) is defined by its shape α and scale β, both of which must be greater than 0. A Pearson 5 distribution has a mean of β/(α-1). The probability density function f(x) is calculated as:

0-17


The Piecewise (General) distribution

A Piecewise or General distribution (Figure 0 -18) is an empirical distribution defined by an array of points, each of which has an x and a y value. Each x value must be larger than the previous x value. Each y value must be at least 0. Finally, the y values of the first and last points must be 0. The probability density function f(x) is calculated as.

0-18


The Poisson distribution

Unlike any of the other distributions described in this appendix, the Poisson distribution (Figure 0 -19) is discrete, rather than continuous. Poisson distributions have one specific role in a ADSM scenario: Poisson distributions are used to determine the number of contacts that will be initiated by each herd that is a source of disease.

A Poisson distribution is defined by its mean, designated λ. The probability mass function f(x) is calculated as:

0-19


The Triangular distribution

A Triangular distribution (Figure 0 -20) is, as its name implies, a triangle. It is described by three parameters: minimum, peak (mode or “most likely”), and maximum values. The Triangular distribution can be symmetric or asymmetric, depending on the relation of the peak to the minimum and maximum values. The probability density function f(x) is calculated as:

0-20


The Uniform distribution

The Uniform distribution (Figure 0 -21) is a rectangular block, indicating that all values within a range occur with equal frequency. It is described by two parameters: the minimum and maximum of the range. The Uniform distribution is inherently symmetric. The probability density function f(x) is calculated as:

0-21


The Weibull distribution

A Weibull distribution (Figure 0 -22) is defined by its shape α and scale β, both of which must be greater than 0. The probability density function f(x) is calculated as:

0-22