International
Tables for
Crystallography
Volume H
Powder diffraction
Edited by C. J. Gilmore, J. A. Kaduk and H. Schenk

International Tables for Crystallography (2018). Vol. H, ch. 3.8, pp. 333-337

Section 3.8.6. Examples

C. J. Gilmore,a G. Barra and W. Donga*

aDepartment of Chemistry, University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
Correspondence e-mail:  chris@chem.gla.ac.uk

3.8.6. Examples

| top | pdf |

All the elements for clustering and visualization are now in place. Fig. 3.8.4[link] shows this as a flowchart. Hitherto we have looked at elements of the aspirin data to demonstrate how methods work; we now examine the aspirin data in detail as a single analysis.

[Figure 3.8.4]

Figure 3.8.4 | top | pdf |

Flowchart for the cluster-analysis and data-visualization procedure described in this chapter. The light grey boxes denote data-visualization elements and the dark grey objects are optional data pre-processing operations.

3.8.6.1. Aspirin data

| top | pdf |

In this example we use 13 powder patterns from commercial aspirin samples collected in reflection mode on a Bruker D8 diffractometer. Since these samples include fillers, the active pharmaceutical ingredient (API) and other formulations, it is not surprising that peak widths are high: ∼0.5° full width at half maximum (FWHM). The data-collection range was 10–43° in 2θ using Cu Kα radiation. The 13 powder data sets are shown in Fig. 3.8.5[link] arranged into groups based on similarity. We have already described the methods of analysis and have shown typical results in Figs. 3.8.6 to 3.8.8[link][link][link], and now present detailed examples. The correlation matrix derived from equation (3.8.3)[link] is shown in Fig. 3.8.9[link](a), colour coded to reflect the values of the coefficients; the darker the shade, the higher the correlation. The resulting dendrogram and MMDS plot are shown in Figs. 3.8.9[link](b) and (c), respectively. Four clusters are identified in the dendrogram and these have been appropriately coloured. Other visualization tools are now shown. In Fig. 3.8.9[link](d) the pie chart is displayed; the number of rows can be adjusted to reflect the arrangement of the samples in a multiple sample holder. Fig. 3.8.9[link](e) shows the default minimum spanning tree with 12 links. In Fig. 3.8.9[link](f) the scree plot indicates that three clusters will account for more than 95% of the data variability. The steep initial slope is a clear indication of good cluster estimation. The silhouettes are shown in Fig. 3.8.9[link](gi). These were discussed in Section 3.8.5.1[link]. In Fig. 3.8.9[link](j) the default parallel-coordinates plot for the same data is shown, and in Fig. 3.8.9[link](k) there is another view taken from the grand tour. These two plots validate the clustering and also indicate that there is no significant error introduced into the MMDS plot by truncating it into three dimensions.

[Figure 3.8.5]

Figure 3.8.5 | top | pdf |

Powder patterns for 13 commercial aspirin samples partitioned into five sets. The patterns are in highly correlated sets: (a) comprises patterns 1, 3, 5, 6, 9 and 12; (b) comprises patterns 10, 11 and 13; (c) contains patterns 2 and 4; (d) contains pattern 7 and (e) contains pattern 8.

[Figure 3.8.6]

Figure 3.8.6 | top | pdf |

(a) The initial default dendrogram using the centroid clustering method on 13 PXRD patterns from 13 commercial aspirin samples. (b) The corresponding MMDS plot. It can be seen that both clusters have a natural break in them and should be partitioned into two clusters. (c) The dendrogram cut line is reduced. (d) The corresponding MMDS plot. The red cluster is now partitioned into two; the remaining patterns are a light-blue singleton and a green triplet cluster. (e) The default dendrogram using the single-link method.

[Figure 3.8.7]

Figure 3.8.7 | top | pdf |

The use of minimum spanning trees (MSTs). (a) The MST with 12 links. (b) The MST with 10 links; three clusters are now present.

[Figure 3.8.8]

Figure 3.8.8 | top | pdf |

The use of silhouettes in defining the details of the clustering. (a) The silhouettes for the red cluster in the dendrogram from Fig. 3.8.6(a)[link]. (b) The corresponding orange cluster. Both sets of silhouettes have values that are less than 0.5, which indicates that the clustering is not well defined. (c) The silhouettes for the red cluster corresponding to the dendrogram in Fig. 3.8.6(c)[link]. The entry centred on a silhouette value of 0.15 is pattern 3. This implies that pattern 3 is only loosely connected to the cluster and this is demonstrated in part (d), where pattern 3 and the most representative pattern for the cluster (No. 9) are superimposed. Although there is a general sense of similarity there are significant differences and the combined correlation coefficient is only 0.62. (e) The silhouettes for the orange cluster corresponding to the dendrogram in Fig. 3.8.6(c)[link]. The silhouettes imply that this is a single cluster without outliers. (f) The silhouettes for the green cluster corresponding to the dendrogram in Fig. 3.8.6[link](c). The clustering is poorly defined here.

[Figure 3.8.9]
[Figure 3.8.9]

Figure 3.8.9 | top | pdf |

The complete cluster analysis for the aspirin samples. (a) The correlation matrix, which is the source of all the clustering results. The entries are colour coded: the darker the shade, the higher the correlation. (b) The dendrogram. The colours assigned to the samples are used in all the visualization tools. (c) The corresponding MMDS plot. The clustering defined by the dendrogram is well defined. (d) The pie-chart view. (e) The minimum spanning tree. (f) The scree plot. It indicates that three clusters explain 95% of the variance of the distance matrix derived from (a). (gi) The silhouettes for the red, the orange and the green clusters, respectively. These are discussed in detail in the caption to Fig. 3.8.8[link]. (j) The default parallel-coordinates plot. The clusters are well maintained into the 4th, 5th and 6th dimensions. (k) Another view of the parallel coordinates using the grand tour. The clustering remains well maintained in higher dimensions.

3.8.6.1.1. Aspirin data with amorphous samples included

| top | pdf |

As a demonstration of the handling of data from amorphous samples, five patterns for amorphous samples were included in the aspirin data and the clustering calculation was repeated. The results are shown in Fig. 3.8.10[link]. Fig. 3.8.10[link](a) shows the dendrogram. It can be seen that the amorphous samples are positioned as isolated clusters on the right-hand end. They also appear as an isolated cluster in the MMDS plot and the parallel-coordinates plots, as shown in Figs. 3.8.10[link](b) and (c). It could be argued that these samples should be treated as a single, five-membered cluster rather than five individuals, but we have found that this confuses the clustering algorithms, and it is clearer to the user if the data from amorphous samples are presented as separate classes.

[Figure 3.8.10]

Figure 3.8.10 | top | pdf |

The aspirin data including data from five amorphous samples. (a) The resulting dendrogram and (b) the corresponding MMDS plot. (c) The parallel-coordinates plot.

3.8.6.2. Phase transitions in ammonium nitrate

| top | pdf |

Ammonium nitrate exhibits temperature-induced phase transformations. Between 256 and 305 K it crystallizes in the orthorhombic space group Pmmm with a = 5.745, b = 5.438, c = 4.942 Å and Z = 2; from 305 to 357 K it crystallizes in Pbnm with a = 7.14, b = 7.65, c = 5.83 Å with Z = 4; between 357 and 398 K it crystallizes in the tetragonal space group [P\overline 4 {2_1}m] with a = 5.719, c = 4.932 Å, Z = 2, and above 398 K it transforms to the cubic space group [Pm\overline 3 m] with a = 4.40 Å and Z = 1. PXRD data containing 75 powder patterns taken at intervals of 3 K starting at 203 K using a D5000 Siemens diffractometer and Cu Kα radiation with a 2θ range of 10–100° were used (Herrmann & Engel, 1997[link]). Fig. 3.8.11[link](a) shows the data in the 2θ range 17–45°.

[Figure 3.8.11]

Figure 3.8.11 | top | pdf |

Ammonium nitrate phase transitions. (a) The raw powder data measured between 203 and 425 K. Reproduced with permission from Herrmann & Engel (1997[link]). Copyright (1997) John Wiley and Sons. (b) The MMDS plot. The purple line follows the temperature change from 203 to 425 K.

The visualization of these data following cluster analysis is shown in Fig. 3.8.11[link](b) using an MMDS plot on which has been superimposed a line showing the route followed by the temperature increments. The purple line follows the transition from a mixture of forms IV and V at low temperature (red) through form IV (yellow), form II (blue) and finally form I at high temperature (green). This is an elegant and concise representation of the data in a single diagram.

References

Herrmann, M. J. & Engel, W. (1997). Phase transitions and lattice dynamics of ammonium nitrate. Propellants Explos. Pyrotech. 22, 143–147.Google Scholar








































to end of page
to top of page