International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2010 
International Tables for Crystallography (2010). Vol. B, ch. 1.3, pp. 5861

Substantial reductions in the arithmetic cost, as well as gains in flexibility, can be obtained if the factoring of the DFT is carried out in several dimensions simultaneously. The presentation given here is a generalization of that of Mersereau & Speake (1981), using the abstract setting established independently by Auslander, Tolimieri & Winograd (1982).
Let us return to the general ndimensional setting of Section 1.3.2.7.4, where the DFT was defined for an arbitrary decimation matrix N by the formulae (where denotes ):with
Let us now assume that this decimation can be factored into d successive decimations, i.e. thatand henceThen the coset decomposition formulae corresponding to these successive decimations (Section 1.3.2.7.1) can be combined as follows:with . Therefore, any may be written uniquely asSimilarly:so that any may be written uniquely aswith . These decompositions are the vector analogues of the multiradix number representation systems used in the Cooley–Tukey factorization.
We may then write the definition of with factors asThe argument of e(–) may be expanded asThe first summand may be recognized as a twiddle factor, the second and third as the kernels of and , respectively, while the fourth is an integer which may be dropped. We are thus led to a `vectorradix' version of the Cooley–Tukey algorithm, in which the successive decimations may be introduced in all n dimensions simultaneously by general integer matrices. The computation may be decomposed into five stages analogous to those of the onedimensional algorithm of Section 1.3.3.2.1:
The initial point transform can thus be performed as transforms on points, followed by transforms on points. This process can be applied successively to all d factors. The same decomposition applies to , up to the complex conjugation of twiddle factors, the normalization factor being obtained as the product of the factors in the successive partial transforms .
The geometric interpretation of this factorization in terms of partial transforms on translates of sublattices applies in full to this ndimensional setting; in particular, the twiddle factors are seen to be related to the residual translations which place the sublattices in register within the big lattice. If the intermediate transforms are performed in place, then the quantitywill eventually be found at locationso that the final results will have to be unscrambled by a process which may be called `coset reversal', the vector equivalent of digit reversal.
Factoring by 2 in all n dimensions simultaneously, i.e. taking , leads to `ndimensional butterflies'. Decimation in time corresponds to the choice , so that is an ndimensional parity class; the calculation then proceeds byDecimation in frequency corresponds to the choice , , so that labels `octant' blocks of shape M; the calculation then proceeds through the following steps:i.e. the parity classes of results, corresponding to the different , are obtained separately. When the dimension n is 2 and the decimating matrix is diagonal, this analysis reduces to the `vector radix FFT' algorithms proposed by Rivard (1977) and Harris et al. (1977). These lead to substantial reductions in the number M of multiplications compared to the row–column method: M is reduced to by simultaneous factoring, and to by simultaneous factoring.
The use of a nondiagonal decimating matrix may bring savings in computing time if the spectrum of the bandlimited function under study is of such a shape as to pack more compactly in a nonrectangular than in a rectangular lattice (Mersereau, 1979). If, for instance, the support K of the spectrum Φ is contained in a sphere, then a decimation matrix producing a close packing of these spheres will yield an aliasingfree DFT algorithm with fewer sample points than the standard algorithm using a rectangular lattice.
Suppose that the decimation matrix N is diagonaland let each diagonal element be written in terms of its prime factors:where m is the total number of distinct prime factors present in the .
The CRT may be used to turn each 1D transform along dimension i into a multidimensional transform with a separate `pseudodimension' for each distinct prime factor of ; the number , of these pseudodimensions is equal to the cardinality of the set:The full ndimensional transform thus becomes μdimensional, with .
We may now permute the μ pseudodimensions so as to bring into contiguous position those corresponding to the same prime factor ; the m resulting groups of pseudodimensions are said to define `pprimary' blocks. The initial transform is now written as a tensor product of m pprimary transforms, where transform j is onpoints [by convention, dimension i is not transformed if ]. These pprimary transforms may be computed, for instance, by multidimensional Cooley–Tukey factorization (Section 1.3.3.3.1), which is faster than the straightforward row–column method. The final results may then be obtained by reversing all the permutations used.
The extra gain with respect to the multidimensional Cooley–Tukey method is that there are no twiddle factors between pprimary pieces corresponding to different primes p.
The case where N is not diagonal has been examined by Guessoum & Mersereau (1986).
Suppose that the CRT has been used as above to map an ndimensional DFT to a μdimensional DFT. For each [κ runs over those pairs (i, j) such that ], the Rader/Winograd procedure may be applied to put the matrix of the κth 1D DFT in the CBA normal form of a Winograd small FFT. The full DFT matrix may then be written, up to permutation of data and results, as
A well known property of the tensor product of matrices allows this to be rewritten asand thus to form a matrix in which the combined preaddition, multiplication and postaddition matrices have been precomputed. This procedure, called nesting, can be shown to afford a reduction of the arithmetic operation count compared to the row–column method (Morris, 1978).
Clearly, the nesting rearrangement need not be applied to all μ dimensions, but can be restricted to any desired subset of them.
Nussbaumer's approach views the DFT as the evaluation of certain polynomials constructed from the data (as in Section 1.3.3.2.4). For instance, putting , the 1D Npoint DFTmay be writtenwhere the polynomial Q is defined by
Let us consider (Nussbaumer & Quandalle, 1979) a 2D transform of size :By introduction of the polynomialsthis may be rewritten:
Let us now suppose that is coprime to N. Then has a unique inverse modulo N (denoted by ), so that multiplication by simply permutes the elements of and hencefor any function f over . We may thus write:whereSince only the value of polynomial at is involved in the result, the computation of may be carried out modulo the unique cyclotomic polynomial such that . Thus, if we define:we may write:or equivalently
For N an odd prime p, all nonzero values of are coprime with p so that the point DFT may be calculated as follows:
Step (1) is a set of p `polynomial transforms' involving no multiplications; step (2) consists of p DFTs on p points each since ifthenstep (3) is a permutation; and step (4) is a ppoint DFT. Thus the 2D DFT on points, which takes 2p ppoint DFTs by the row–column method, involves only ppoint DFTs; the other DFTs have been replaced by polynomial transforms involving only additions.
This procedure can be extended to n dimensions, and reduces the number of 1D ppoint DFTs from for the row–column method to , at the cost of introducing extra additions in the polynomial transforms.
A similar algorithm has been formulated by Auslander et al. (1983) in terms of Galois theory.
References
Auslander, L., Feig, E. & Winograd, S. (1983). New algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 31, 388–403.Auslander, L., Tolimieri, R. & Winograd, S. (1982). Hecke's theorem in quadratic reciprocity, finite nilpotent groups and the Cooley–Tukey algorithm. Adv. Math. 43, 122–172.
Guessoum, A. & Mersereau, R. M. (1986). Fast algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 34, 937–943.
Harris, D. B., McClellan, J. H., Chan, D. S. K. & Schuessler, H. W. (1977). Vector radix fast Fourier transform. Rec. 1977 IEEE Internat. Conf. Acoust. Speech Signal Process. pp. 548–551.
Mersereau, R. & Speake, T. C. (1981). A unified treatment of Cooley–Tukey algorithms for the evaluation of the multidimensional DFT. IEEE Trans. Acoust. Speech Signal Process. 29, 1011–1018.
Mersereau, R. M. (1979). The processing of hexagonally sampled twodimensional signals. Proc. IEEE, 67, 930–949.
Morris, R. L. (1978). A comparative study of time efficient FFT and WFTA programs for general purpose computers. IEEE Trans. Acoust. Speech Signal Process. 26, 141–150.
Nussbaumer, H. J. & Quandalle, P. (1979). Fast computation of discrete Fourier transforms using polynomial transforms. IEEE Trans. Acoust. Speech Signal Process. 27, 169–181.
Rivard, G. E. (1977). Direct fast Fourier transform of bivariate functions. IEEE Trans. Acoust. Speech Signal Process. 25, 250–252.