tail approximation

Extreme values are useful to quantify the risk of catastrophic floods, and much more.

This is a brief set of notes with an introduction to extreme value theory. For reviews, see Leadbetter et al (1983) and David and Huser (2015) [references at the end]. Also of some (historical) interest might be the classical book by Gumbel (1958). Let $X_1, \dots, X_n$ be a sequence of independent and identically distributed variables with cumulative distribution function (cdf) $F(x)$ and let $M_n =\max(X_1,\dots,X_n)$ denote the maximum.

If $F(x)$ is known, the distribution of the maximum is:

$\begin{array}{lll} P(M_n \leqslant x) &=&P(X_1 \leqslant x, \dots, X_n \leqslant x) \\ &=& P(X_1 \leqslant x) \cdots P(X_n \leqslant x) = F^n(x). \end{array}$

The distribution function $F(x)$ might, however, not be known. If data are available, it can be estimated, although small errors on the estimation of $F(x)$ can lead to large errors concerning the extreme values. Instead, an asymptotic result is given by the extremal types theorem, also known as Fisher-Tippett-Gnedenko Theorem, First Theorem of Extreme Values, or extreme value trinity theorem (called under the last name by Picklands III, 1975).

But before that, let’s make a small variable change. Working with $M_n$ directly is problematic because as $n \rightarrow \infty$ , $F^n(x) \rightarrow 0$ . Redefining the problem as a function of $M_n^* = \frac{M_n-b_n}{a_n}$ renders treatment simpler. The theorem can be stated then as: If there exist sequences of constants $a_n \in \mathbb{R}_{+}$ and $b_n \in \mathbb{R}$ such that, as $n \rightarrow \infty$ :

$P\left(M_{n}^{*} \leqslant x \right) \rightarrow G(x)$

then $G(x)$ belongs to one of three “domains of attraction”:

Type I (Gumbel law): $\Lambda(x) = e^{-e^{-\frac{x-b}{a}}}$ , for $x \in \mathbb{R}$ indicating that the distribution of $M_n$ has an exponential tail.
Type II (Fréchet law): $\Phi(x) = \begin{cases} 0 & x\leqslant b \\ e^{-\left(\frac{x-b}{a}\right)^{-\alpha}} & x\; \textgreater\; b \end{cases}$ indicating that the distribution of $M_n$ has a heavy tail (including polynomial decay).
Type III (Weibull law): $\Psi(x) = \begin{cases} e^{-\left( -\frac{x-b}{a}\right)^\alpha} & x\;\textless\; b \\ 1 & x\geqslant b \end{cases}$ indicating that the distribution of $M_n$ has a light tail with finite upper bound.

Note that in the above formulation, the Weibull is reversed so that the distribution has an upper bound, as opposed to a lower one as in the Weibull distribution. Also, the parameterisation is slightly different than the one usually adopted for the Weibull distribution.

These three families have parameters $a\; \textgreater\; 0$ , $b$ and, for families II and III, $\alpha\; \textgreater\; 0$ . To which of the three a particular $F(x)$ is attracted is determined by the behaviour of the tail of of the distribution for large $x$ . Thus, we can infer about the asymptotic properties of the maximum while having only a limited knowledge of the properties of $F(x)$ .

These three limiting cases are collectively termed extreme value distributions. Types II and III were identified by Fréchet (1927), whereas type I was found by Fisher and Tippett (1928). In his work, Fréchet used $M_n^* = \frac{M_n}{a_n}$ , whereas Fisher and Tippett used $M_n^* = \frac{M_n-b_n}{a_n}$ . Von Mises (1936) identified various sufficient conditions for convergence to each of these forms, and Gnedenko (1943) established a complete generalisation.

Generalised extreme value distribution

As shown above, the rescaled maxima converge in distribution to one of three families. However, all are cases of a single family that can be represented as:

$G(x) = e^{-\left(1-\xi\left(\frac{x-\mu}{\sigma}\right)\right)^{\frac{1}{\xi}}}$

defined on the set $\left\{x:1-\xi\frac{x-\mu}{\sigma}\;\textgreater\;0\right\}$ , with parameters $-\infty \;\textless \;\mu\;\textless\; \infty$ (location), $\sigma\;\textgreater\;0$ (scale), and $-\infty\;\textless\;\xi\;\textless\;\infty$ (shape). This is the generalised extreme value (GEV) family of distributions. If $\xi \rightarrow 0$ , it converges to Gumbel (type I), whereas if $\xi < 0$ it corresponds to Fréchet (type II), and if $\xi\;\textgreater\;0$ it corresponds to Weibull (type III). Inference on $\xi$ allows choice of a particular family for a given problem.

Generalised Pareto distribution

For $u\rightarrow\infty$ , the limiting distribution of a random variable $Y=X-u$ , conditional on $X \;\textgreater\; u$ , is:

$H(y) = 1-\left(1-\frac{\xi y}{\tilde{\sigma}}\right)^{\frac{1}{\xi}}$

defined for $y \;\textgreater\; 0$ and $\left(1-\frac{\xi y}{\tilde{\sigma}}\right) \;\textgreater\; 0$ . The two parameters are the $\xi$ (shape) and $\tilde{\sigma}$ (scale). The shape corresponds to the same parameter $\xi$ of the GEV, whereas the scale relates to the scale of the former as $\tilde{\sigma}=\sigma-\xi(u-\mu)$ .

The above is sometimes called the Picklands-Baikema-de Haan theorem or the Second Theorem of Extreme Values, and it defines another family of distributions, known as generalised Pareto distribution (GPD). It generalises an exponential distribution with parameter $\frac{1}{\tilde{\sigma}}$ as $\xi \rightarrow 0$ , an uniform distribution in the interval $\left[0, \tilde{\sigma}\right]$ when $\xi = 1$ , and a Pareto distribution when $\xi \;\textgreater\; 0$ .

Parameter estimation

By restricting the attention to the most common case of $-\frac{1}{2}<\xi<\frac{1}{2}$ , which represent distributions approximately exponential, parametters for the GPD can be estimated using at least three methods: maximum likelihood, moments, and probability-weighted moments. These are described in Hosking and Wallis (1987). For $\xi$ outside this interval, methods have been discussed elsewhere (Oliveira, 1984). The method of moments is probably the simplest, fastest and, according to Hosking and Wallis (1987) and Knijnenburg et al (2009), has good performance for the typical cases of $-\frac{1}{2}<\xi<\frac{1}{2}$ .

For a set of extreme observations, let $\bar{x}$ and $s^2$ be respectively the sample mean and variance. The moment estimators of $\tilde{\sigma}$ and $\xi$ are $\hat{\tilde{\sigma}} = \frac{\bar{x}}{2}\left(\frac{\bar{x}^2}{s^2}+1\right)$ and $\hat{\xi}=\frac{1}{2}\left(\frac{\bar{x}^2}{s^2}-1\right)$ .

The accuracy of these estimates can be tested with, e.g., the Anderson-Darling goodness-of-fit test (Anderson and Darling, 1952; Choulakian and Stephens, 2001), based on the fact that, if the modelling is accurate, the p-values for the distribution should be uniformly distributed.