International
Tables for
Crystallography
Volume H
Powder diffraction
Edited by C. J. Gilmore, J. A. Kaduk and H. Schenk

International Tables for Crystallography (2018). Vol. H, ch. 3.8, pp. 333-335

Section 3.8.6.1. Aspirin data

C. J. Gilmore,a G. Barra and W. Donga*

aDepartment of Chemistry, University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
Correspondence e-mail:  chris@chem.gla.ac.uk

3.8.6.1. Aspirin data

| top | pdf |

In this example we use 13 powder patterns from commercial aspirin samples collected in reflection mode on a Bruker D8 diffractometer. Since these samples include fillers, the active pharmaceutical ingredient (API) and other formulations, it is not surprising that peak widths are high: ∼0.5° full width at half maximum (FWHM). The data-collection range was 10–43° in 2θ using Cu Kα radiation. The 13 powder data sets are shown in Fig. 3.8.5[link] arranged into groups based on similarity. We have already described the methods of analysis and have shown typical results in Figs. 3.8.6 to 3.8.8[link][link][link], and now present detailed examples. The correlation matrix derived from equation (3.8.3)[link] is shown in Fig. 3.8.9[link](a), colour coded to reflect the values of the coefficients; the darker the shade, the higher the correlation. The resulting dendrogram and MMDS plot are shown in Figs. 3.8.9[link](b) and (c), respectively. Four clusters are identified in the dendrogram and these have been appropriately coloured. Other visualization tools are now shown. In Fig. 3.8.9[link](d) the pie chart is displayed; the number of rows can be adjusted to reflect the arrangement of the samples in a multiple sample holder. Fig. 3.8.9[link](e) shows the default minimum spanning tree with 12 links. In Fig. 3.8.9[link](f) the scree plot indicates that three clusters will account for more than 95% of the data variability. The steep initial slope is a clear indication of good cluster estimation. The silhouettes are shown in Fig. 3.8.9[link](gi). These were discussed in Section 3.8.5.1[link]. In Fig. 3.8.9[link](j) the default parallel-coordinates plot for the same data is shown, and in Fig. 3.8.9[link](k) there is another view taken from the grand tour. These two plots validate the clustering and also indicate that there is no significant error introduced into the MMDS plot by truncating it into three dimensions.

[Figure 3.8.5]

Figure 3.8.5 | top | pdf |

Powder patterns for 13 commercial aspirin samples partitioned into five sets. The patterns are in highly correlated sets: (a) comprises patterns 1, 3, 5, 6, 9 and 12; (b) comprises patterns 10, 11 and 13; (c) contains patterns 2 and 4; (d) contains pattern 7 and (e) contains pattern 8.

[Figure 3.8.6]

Figure 3.8.6 | top | pdf |

(a) The initial default dendrogram using the centroid clustering method on 13 PXRD patterns from 13 commercial aspirin samples. (b) The corresponding MMDS plot. It can be seen that both clusters have a natural break in them and should be partitioned into two clusters. (c) The dendrogram cut line is reduced. (d) The corresponding MMDS plot. The red cluster is now partitioned into two; the remaining patterns are a light-blue singleton and a green triplet cluster. (e) The default dendrogram using the single-link method.

[Figure 3.8.7]

Figure 3.8.7 | top | pdf |

The use of minimum spanning trees (MSTs). (a) The MST with 12 links. (b) The MST with 10 links; three clusters are now present.

[Figure 3.8.8]

Figure 3.8.8 | top | pdf |

The use of silhouettes in defining the details of the clustering. (a) The silhouettes for the red cluster in the dendrogram from Fig. 3.8.6(a)[link]. (b) The corresponding orange cluster. Both sets of silhouettes have values that are less than 0.5, which indicates that the clustering is not well defined. (c) The silhouettes for the red cluster corresponding to the dendrogram in Fig. 3.8.6(c)[link]. The entry centred on a silhouette value of 0.15 is pattern 3. This implies that pattern 3 is only loosely connected to the cluster and this is demonstrated in part (d), where pattern 3 and the most representative pattern for the cluster (No. 9) are superimposed. Although there is a general sense of similarity there are significant differences and the combined correlation coefficient is only 0.62. (e) The silhouettes for the orange cluster corresponding to the dendrogram in Fig. 3.8.6(c)[link]. The silhouettes imply that this is a single cluster without outliers. (f) The silhouettes for the green cluster corresponding to the dendrogram in Fig. 3.8.6[link](c). The clustering is poorly defined here.

[Figure 3.8.9]
[Figure 3.8.9]

Figure 3.8.9 | top | pdf |

The complete cluster analysis for the aspirin samples. (a) The correlation matrix, which is the source of all the clustering results. The entries are colour coded: the darker the shade, the higher the correlation. (b) The dendrogram. The colours assigned to the samples are used in all the visualization tools. (c) The corresponding MMDS plot. The clustering defined by the dendrogram is well defined. (d) The pie-chart view. (e) The minimum spanning tree. (f) The scree plot. It indicates that three clusters explain 95% of the variance of the distance matrix derived from (a). (gi) The silhouettes for the red, the orange and the green clusters, respectively. These are discussed in detail in the caption to Fig. 3.8.8[link]. (j) The default parallel-coordinates plot. The clusters are well maintained into the 4th, 5th and 6th dimensions. (k) Another view of the parallel coordinates using the grand tour. The clustering remains well maintained in higher dimensions.

3.8.6.1.1. Aspirin data with amorphous samples included

| top | pdf |

As a demonstration of the handling of data from amorphous samples, five patterns for amorphous samples were included in the aspirin data and the clustering calculation was repeated. The results are shown in Fig. 3.8.10[link]. Fig. 3.8.10[link](a) shows the dendrogram. It can be seen that the amorphous samples are positioned as isolated clusters on the right-hand end. They also appear as an isolated cluster in the MMDS plot and the parallel-coordinates plots, as shown in Figs. 3.8.10[link](b) and (c). It could be argued that these samples should be treated as a single, five-membered cluster rather than five individuals, but we have found that this confuses the clustering algorithms, and it is clearer to the user if the data from amorphous samples are presented as separate classes.

[Figure 3.8.10]

Figure 3.8.10 | top | pdf |

The aspirin data including data from five amorphous samples. (a) The resulting dendrogram and (b) the corresponding MMDS plot. (c) The parallel-coordinates plot.








































to end of page
to top of page