coefficient of kinship

Coefficients to assess the genetic resemblance between individuals were presented in the last post. Among these, the coefficient of kinship, $\phi$ , is probably the most interesting. It gives a probabilistic estimate that a random gene from a given subject $i$ is identical by descent (ibd) to a gene in the same locus from a subject $j$ . For $N$ subjects, these probabilities can be assembled in a $N \times N$ matrix termed kinship matrix, usually represented as $\mathbf{\Phi}$ , that has elements $\phi_{ij}$ , and that can be used to model the covariance between individuals in quantitative genetics.

Consider the pedigree in the figure below, consisted of 14 subjects:

The corresponding kinship matrix, already multiplied by two to indicate expected covariances between subjects (i.e., $2\cdot\mathbf{\Phi}$ ), is:

Note that the diagonal elements can have values above unity, given the consanguineous mating in the family (between s09 and s12, indicated by the double line in the pedigree diagram).

In the next post, details on how the kinship matrix can be used investigate heritabilities, genetic correlations, and to perform association studies will be presented.

How similar?

The degree of relationship between two related individuals can be estimated by the probability that a gene in one subject is identical by descent to the corresponding gene (i.e., in the same locus) in the other. Two genes are said to be identical by descent (ibd) if both are copies of the same ancestral gene. Genes that are not ibd may still be identical through separate mutations, and be therefore identical by state (ibs), though these will not be considered in what follows.

The coefficients below were introduced by Jacquard in 1970, in a book originally published in French, and translated to English in 1974. A similar content appeared in an article by the same author in the journal Biometrics in 1972 (see the references at the end).

Coefficients of identity

Consider a particular autosomal gene $G$ . Each individual has two copies, one from paternal, another from maternal origin; these can be indicated as $G_i^P$ and $G_i^M$ for individual $i$ . There are 15 exactly distinct ways (states) in which the $G$ can be identical or not identical between two individuals, as shown in the figure below.

To each of these states $S_{1, \ldots , 15}$ , a respective probability $\delta_{1, \ldots , 15}$ can be assigned; these are called coefficients of identity by descent. These probabilities can be calculated at every generation following very elementary rules. For most problems, however, the distinction between paternal and maternal origin of a gene is irrelevant, and some of the above states are equivalent to others. If these are condensed, we can retain 9 distinct ways, shown in the figure below:

As before, to each of these states $\Sigma_{1, \ldots , 9}$ , a respective probability $\Delta_{1, \ldots , 9}$ can be assigned; these are called condensed coefficients of identity by descent, and relate to the former as:

$\Delta_1 = \delta_1$
$\Delta_2 = \delta_6$
$\Delta_3 = \delta_2 + \delta_3$
$\Delta_4 = \delta_7$
$\Delta_5 = \delta_4 + \delta_5$
$\Delta_6 = \delta_8$
$\Delta_7 = \delta_9 + \delta_{12}$
$\Delta_8 = \delta_{10} + \delta_{11} + \delta_{13} + \delta_{14}$
$\Delta_9 = \delta_{15}$

A similar method was proposed by Cotterman (1940), in his highly influential but only much later published doctoral thesis. The $\Delta_9$ , $\Delta_8$ and $\Delta_7$ correspond to his coefficients $k_0$ , $k_1$ and $k_2$ .

Coefficient of kinship

The above refer to probabilities of finding particular genes as identical among subjects. However, a different coefficient can be defined for random genes: the probability that a random gene from subject $i$ is identical with a gene at the same locus from subject $j$ is the coefficient of kinship, and can be represented as $\phi_{ij}$ :

$\phi_{ij} = \Delta_1 + \frac{1}{2}(\Delta_3 + \Delta_5 + \Delta_7) + \frac{1}{4}\Delta_8$

If $i$ and $j$ are in fact the same individual, then $\phi_{ii}$ is the kinship of a subject with himself. Two genes taken from the same individual can either be the same gene (probability $\frac{1}{2}$ of being the same) or be the genes inherited from father and mother, in which case the probability is given by the coefficient of kinship between the parents. In other words, $\phi_{ii} = \frac{1}{2} + \frac{1}{2}\phi_{\text{FM}}$ . If both parents are unrelated, $\phi_{\text{FM}}=0$ , such that the kinship of a subject with himself is $\phi_{ii} = \frac{1}{2}$ .

The value of $\phi_{ij}$ can be determined from the number of generations up to a common ancestor $k$ . A random gene from individual $i$ can be identical to a random gene from individual $j$ in the same locus if both comes from the common ancestor $k$ , an event that can happen if either (1) both are copies of the gene in $k$ , or (2) if they are copies of different genes in $k$ , but $k$ is inbred; this has probability $\frac{1}{2}f_k$ (see below about the coefficient of inbreeding, $f$ ). Thus, if there are $m$ generations between $i$ and $k$ , and $n$ generations between $j$ and $k$ , the coefficient of kinship can be computed as $\phi_{ij} = \left(\frac{1}{2}\right)^{m+n+1}(1+f_k)$ . If $i$ and $j$ can have more than one common ancestor, then there are more than one line of descent possible, and the kinship is determined by integrating over all such possible $K$ common ancestors:

$\phi_{ij} = \sum_{k=1}^K \left(\frac{1}{2}\right)^{m_k+n_k+1}(1+f_k)$

For a set of subjects, the pairwise coefficients of kinship $\phi_{ij}$ can be arranged in a square matrix $\boldsymbol{\Phi}$ , and used to model the covariance between subjects as $2\cdot\boldsymbol{\Phi}$ (see here).

Coefficient of inbreeding

The coefficient of inbreeding $f$ of a given subject $i$ is the coefficient of kinship between their parents. While the above coefficients provide information about pairs of individuals, the coefficient of inbreeding gives information about a particular subject. Yet, $f_i$ can be computed from the coefficients of identity:

$f_{i} = \Delta_1 + \Delta_2 + \Delta_3 + \Delta_4$
$f_{j} = \Delta_1 + \Delta_2 + \Delta_5 + \Delta_6$

Note that all these coefficients are based on probabilities, but it is now possible to identify the actual presence of a particular gene using marker data. Also note that while the illustrations above suggest application to livestock, the same applies to studies of human populations.

Some particular cases

The computation of the above coefficients can be done using algorithms, and are done automatically by software that allow analyses of pedigree data, such as solar. Some common particular cases are shown below:

Relationship	$\Delta_1$	$\Delta_2$	$\Delta_3$	$\Delta_4$	$\Delta_5$	$\Delta_6$	$\Delta_7$	$\Delta_8$	$\Delta_9$	$\phi_{ij}$
Self	$0$	$0$	$0$	$0$	$0$	$0$	$1$	$0$	$0$	$\frac{1}{2}$
Parent-offspring	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$1$	$0$	$\frac{1}{4}$
Half sibs	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{2}$	$\frac{1}{2}$	$\frac{1}{8}$
Full sibs/dizygotic twins	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{4}$	$\frac{1}{2}$	$\frac{1}{4}$	$\frac{1}{4}$
Monozygotic twins	$0$	$0$	$0$	$0$	$0$	$0$	$1$	$0$	$0$	$\frac{1}{2}$
First cousins	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{4}$	$\frac{3}{4}$	$\frac{1}{16}$
Double first cousins	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{16}$	$\frac{6}{16}$	$\frac{9}{16}$	$\frac{1}{8}$
Second cousins	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{16}$	$\frac{15}{16}$	$\frac{1}{64}$
Uncle-nephew	$0$	$0$	$0$	$0$	$0$	$0$	$0$	$\frac{1}{2}$	$\frac{1}{2}$	$\frac{1}{8}$
Offspring of sib-matings	$\frac{1}{16}$	$\frac{1}{32}$	$\frac{1}{8}$	$\frac{1}{32}$	$\frac{1}{8}$	$\frac{1}{32}$	$\frac{7}{32}$	$\frac{5}{16}$	$\frac{1}{16}$	$\frac{3}{8}$

References

Cotterman C. A calculus for statistico-genetics. 1940. PhD Thesis. Ohio State University.
Jacquard, A. Structures génétiques des populations. Masson, Paris, France, 1970, later translated and republished as Jacquard, A. The genetic structure of populations. Springer, Heidelberg, 1974.
Jacquard A. Genetic information given by a relative. Biometrics. 1972;28(4):1101-1114.

The photograph at the top (sheep) is in public domain.