Tables for
Volume H
Powder diffraction
Edited by C. J. Gilmore, J. A. Kaduk and H. Schenk

International Tables for Crystallography (2018). Vol. H, ch. 3.8, p. 327

Section Dendrograms

C. J. Gilmore,a G. Barra and W. Donga*

aDepartment of Chemistry, University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
Correspondence e-mail: Dendrograms

| top | pdf |

Using d and s, agglomerative, hierarchical cluster analysis is now carried out, in which the patterns are put into clusters as defined by their distances from each other. [Gordon (1981[link], 1999[link]) and Everitt et al. (2001[link]) provide excellent and detailed introductions to the subject. Note that the two editions of Gordon's monograph are quite distinct and complementary.] The method begins with a situation in which each pattern is considered to be in a separate cluster. It then searches for the two patterns with the shortest distance between then, and joins them into a single cluster. This continues in a stepwise fashion until all the patterns form a single cluster. When two clusters (Ci and Cj) are merged, there is the problem of defining the distance between the newly formed cluster [C_i\cup C_j] and any other cluster Ck. There are a number of different ways of doing this, and each one gives rise to a different clustering of the patterns, although often the difference can be quite small. A general algorithm has been proposed by Lance & Williams (1967[link]), and is summarized in a simplified form by Gordon (1981[link]). The distance from the new cluster formed by merging Ci and Cj to any other cluster Ck is given by[\eqalignno{d({{C_i} \cup {C_j},\,{C_k}}) &= {\alpha _i}d({{C_i},{C_k}}) + {\alpha _j}d({{C_j},{C_k}}) + \beta d({{C_i},{C_j}})&\cr&\quad + \gamma \left| {d({{C_i},{C_k}}) - d({{C_j},{C_k}})} \right|. &(3.8.11)}]There are many possible clustering methods. Table 3.8.1[link] defines six commonly used clustering methods, defined in terms of the parameters α, β and γ. All these methods can be used with powder data; in general, the group-average-link or single-link formalism is the most effective, although differences between the methods are often slight.

Table 3.8.1| top | pdf |
Six commonly used clustering methods

For each method, the coefficients αi, β and γ in equation (3.8.11)[link] are given.

Single link ½ 0 −½
Complete link ½ 0 ½
Average link ni/(ni + nj) 0 0
Weighted-average link ½ 0 0
Centroid ni/(ni + nj) ninj/(ni + nj)2 0
Sum of squares (ni + nk)/(ni + nj + nk) nk/(ni + nj + nk) 0

The results of cluster analysis are usually displayed as a dendrogram, a typical example of which is shown in Fig. 3.8.6[link](a), where a set of 13 powder patterns is analysed using the centroid method. Each pattern begins at the bottom of the plot as a separate cluster, and these amalgamate in stepwise fashion linked by horizontal tie bars. The height of the tie bar represents a similarity measure as measured by the relevant distance. As an indication of the differences that can be expected in the various algorithms used for dendrogram generation, Fig. 3.8.6[link](e) shows the same data analysed using the single-link method: the resulting clustering is slightly different: the similarity measures are larger, and, in consequence, the tie bars are higher on the graph. [For further examples see Barr et al. (2004b[link],c[link]) and Barr, Dong, Gilmore & Faber (2004)[link].]


Barr, G., Dong, W. & Gilmore, C. J. (2004a). High-throughput powder diffraction. II. Applications of clustering methods and multivariate data analysis. J. Appl. Cryst. 37, 243–252.Google Scholar
Barr, G., Dong, W. & Gilmore, C. J. (2004b). High-throughput powder diffraction. IV. Cluster validation using silhouettes and fuzzy clustering. J. Appl. Cryst. 37, 874–882.Google Scholar
Barr, G., Dong, W., Gilmore, C. & Faber, J. (2004). High-throughput powder diffraction. III. The application of full-profile pattern matching and multivariate statistical analysis to round-robin-type data sets. J. Appl. Cryst. 37, 635–642.Google Scholar
Everitt, B. S., Landau, S. & Leese, M. (2001). Cluster Analysis, 4th ed. London: Arnold.Google Scholar
Gordon, A. D. (1981). Classification, 1st ed., pp. 46–49. London: Chapman and Hall.Google Scholar
Gordon, A. D. (1999). Classification, 2nd ed. Boca Raton: Chapman and Hall/CRC.Google Scholar
Lance, G. N. & Williams, W. T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput. J. 9, 373–380.Google Scholar

to end of page
to top of page