International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2010 
International Tables for Crystallography (2010). Vol. B, ch. 1.3, pp. 5254

The presentation of Gentleman & Sande (1966) will be followed first [see also Cochran et al. (1967)]. It will then be reinterpreted in geometric terms which will prepare the way for the treatment of multidimensional transforms in Section 1.3.3.3.
Suppose that the number of sample points N is composite, say . We may write k to the base and to the base as follows:The defining relation for may then be written:The argument of may be expanded asand the last summand, being an integer, may be dropped:This computation may be decomposed into five stages, as follows:
If the intermediate transforms in stages (ii) and (iv) are performed in place, i.e. with the results overwriting the data, then at stage (v) the result will be found at address . This phenomenon is called scrambling by `digit reversal', and stage (v) is accordingly known as unscrambling.
The initial Npoint transform has thus been performed as transforms on points, followed by transforms on points, thereby reducing the arithmetic cost from to . The phase shifts applied at stage (iii) are traditionally called `twiddle factors', and the transposition between and can be performed by the fast recursive technique of Eklundh (1972). Clearly, this procedure can be applied recursively if and are themselves composite, leading to an overall arithmetic cost of order N log N if N has no large prime factors.
The Cooley–Tukey factorization may also be derived from a geometric rather than arithmetic argument. The decomposition is associated to a geometric partition of the residual lattice into copies of , each translated by and `blown up' by a factor . This partition in turn induces a (direct sum) decomposition of X aswhere
According to (i), is related to by decimation by and offset by . By Section 1.3.2.7.2, is related to by periodization by and phase shift by , so thatthe periodization by being reflected by the fact that does not depend on . Writing and expanding shows that the phase shift contains both the twiddle factor and the kernel of . The Cooley–Tukey algorithm is thus naturally associated to the coset decomposition of a lattice modulo a sublattice (Section 1.3.2.7.2).
It is readily seen that essentially the same factorization can be obtained for , up to the complex conjugation of the twiddle factors. The normalizing constant arises from the normalizing constants and in and , respectively.
Factors of 2 are particularly simple to deal with and give rise to a characteristic computational structure called a `butterfly loop'. If , then two options exist:
By repeated factoring of the number N of sample points, the calculation of and can be reduced to a succession of stages, the smallest of which operate on single prime factors of N. The reader is referred to Gentleman & Sande (1966) for a particularly lucid analysis of the programming considerations which help implement this factorization efficiently; see also Singleton (1969). Powers of two are often grouped together into factors of 4 or 8, which are advantageous in that they require fewer complex multiplications than the repeated use of factors of 2. In this approach, large prime factors P are detrimental, since they require a full size computation according to the defining formula.
References
Cochran, W. T., Cooley, J. W., Favin, D. L., Helms, H. D., Kaenel, R. A., Lang, W. W., Maling, G. C., Nelson, D. E., Rader, C. M. & Welch, P. D. (1967). What is the fast Fourier transform? IEEE Trans. Audio, 15, 45–55.Eklundh, J. O. (1972). A fast computer method for matrix transposing. IEEE Trans. C21, 801–803.
Gentleman, W. M. & Sande, G. (1966). Fast Fourier transforms – for fun and profit. In AFIPS Proc. 1966 Fall Joint Computer Conference, pp. 563–578. Washington: Spartan Books.
Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. AU, 17, 93–103.