International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 1.3, pp. 58-61   | 1 | 2 |

## Section 1.3.3.3.2. Multidimensional factorization

G. Bricognea

aGlobal Phasing Ltd, Sheraton House, Suites 14–16, Castle Park, Cambridge CB3 0AX, England, and LURE, Bâtiment 209D, Université Paris-Sud, 91405 Orsay, France

#### 1.3.3.3.2. Multidimensional factorization

| top | pdf |

Substantial reductions in the arithmetic cost, as well as gains in flexibility, can be obtained if the factoring of the DFT is carried out in several dimensions simultaneously. The presentation given here is a generalization of that of Mersereau & Speake (1981), using the abstract setting established independently by Auslander, Tolimieri & Winograd (1982).

Let us return to the general n-dimensional setting of Section 1.3.2.7.4, where the DFT was defined for an arbitrary decimation matrix N by the formulae (where denotes ):with

#### 1.3.3.3.2.1. Multidimensional Cooley–Tukey factorization

| top | pdf |

Let us now assume that this decimation can be factored into d successive decimations, i.e. thatand henceThen the coset decomposition formulae corresponding to these successive decimations (Section 1.3.2.7.1) can be combined as follows:with . Therefore, any may be written uniquely asSimilarly:so that any may be written uniquely aswith . These decompositions are the vector analogues of the multi-radix number representation systems used in the Cooley–Tukey factorization.

We may then write the definition of with factors asThe argument of e(–) may be expanded asThe first summand may be recognized as a twiddle factor, the second and third as the kernels of and , respectively, while the fourth is an integer which may be dropped. We are thus led to a vector-radix' version of the Cooley–Tukey algorithm, in which the successive decimations may be introduced in all n dimensions simultaneously by general integer matrices. The computation may be decomposed into five stages analogous to those of the one-dimensional algorithm of Section 1.3.3.2.1:

 (i) form the vectors of shape by (ii) calculate the transforms on points: (iii) form the vectors of shape by (iv) calculate the transforms on points: (v) collect as .

The initial -point transform can thus be performed as transforms on points, followed by transforms on points. This process can be applied successively to all d factors. The same decomposition applies to , up to the complex conjugation of twiddle factors, the normalization factor being obtained as the product of the factors in the successive partial transforms .

The geometric interpretation of this factorization in terms of partial transforms on translates of sublattices applies in full to this n-dimensional setting; in particular, the twiddle factors are seen to be related to the residual translations which place the sublattices in register within the big lattice. If the intermediate transforms are performed in place, then the quantitywill eventually be found at locationso that the final results will have to be unscrambled by a process which may be called coset reversal', the vector equivalent of digit reversal.

Factoring by 2 in all n dimensions simultaneously, i.e. taking , leads to n-dimensional butterflies'. Decimation in time corresponds to the choice , so that is an n-dimensional parity class; the calculation then proceeds byDecimation in frequency corresponds to the choice , , so that labels octant' blocks of shape M; the calculation then proceeds through the following steps:i.e. the parity classes of results, corresponding to the different , are obtained separately. When the dimension n is 2 and the decimating matrix is diagonal, this analysis reduces to the vector radix FFT' algorithms proposed by Rivard (1977) and Harris et al. (1977). These lead to substantial reductions in the number M of multiplications compared to the row–column method: M is reduced to by simultaneous factoring, and to by simultaneous factoring.

The use of a nondiagonal decimating matrix may bring savings in computing time if the spectrum of the band-limited function under study is of such a shape as to pack more compactly in a nonrectangular than in a rectangular lattice (Mersereau, 1979). If, for instance, the support K of the spectrum Φ is contained in a sphere, then a decimation matrix producing a close packing of these spheres will yield an aliasing-free DFT algorithm with fewer sample points than the standard algorithm using a rectangular lattice.

#### 1.3.3.3.2.2. Multidimensional prime factor algorithm

| top | pdf |

Suppose that the decimation matrix N is diagonaland let each diagonal element be written in terms of its prime factors:where m is the total number of distinct prime factors present in the .

The CRT may be used to turn each 1D transform along dimension i into a multidimensional transform with a separate pseudo-dimension' for each distinct prime factor of ; the number , of these pseudo-dimensions is equal to the cardinality of the set:The full n-dimensional transform thus becomes μ-dimensional, with .

We may now permute the μ pseudo-dimensions so as to bring into contiguous position those corresponding to the same prime factor ; the m resulting groups of pseudo-dimensions are said to define p-primary' blocks. The initial transform is now written as a tensor product of m p-primary transforms, where transform j is onpoints [by convention, dimension i is not transformed if ]. These p-primary transforms may be computed, for instance, by multidimensional Cooley–Tukey factorization (Section 1.3.3.3.1), which is faster than the straightforward row–column method. The final results may then be obtained by reversing all the permutations used.

The extra gain with respect to the multidimensional Cooley–Tukey method is that there are no twiddle factors between p-primary pieces corresponding to different primes p.

The case where N is not diagonal has been examined by Guessoum & Mersereau (1986).

#### 1.3.3.3.2.3. Nesting of Winograd small FFTs

| top | pdf |

Suppose that the CRT has been used as above to map an n-dimensional DFT to a μ-dimensional DFT. For each [κ runs over those pairs (i, j) such that ], the Rader/Winograd procedure may be applied to put the matrix of the κth 1D DFT in the CBA normal form of a Winograd small FFT. The full DFT matrix may then be written, up to permutation of data and results, as

A well known property of the tensor product of matrices allows this to be rewritten asand thus to form a matrix in which the combined pre-addition, multiplication and post-addition matrices have been precomputed. This procedure, called nesting, can be shown to afford a reduction of the arithmetic operation count compared to the row–column method (Morris, 1978).

Clearly, the nesting rearrangement need not be applied to all μ dimensions, but can be restricted to any desired subset of them.

#### 1.3.3.3.2.4. The Nussbaumer–Quandalle algorithm

| top | pdf |

Nussbaumer's approach views the DFT as the evaluation of certain polynomials constructed from the data (as in Section 1.3.3.2.4). For instance, putting , the 1D N-point DFTmay be writtenwhere the polynomial Q is defined by

Let us consider (Nussbaumer & Quandalle, 1979) a 2D transform of size :By introduction of the polynomialsthis may be rewritten:

Let us now suppose that is coprime to N. Then has a unique inverse modulo N (denoted by ), so that multiplication by simply permutes the elements of and hencefor any function f over . We may thus write:whereSince only the value of polynomial at is involved in the result, the computation of may be carried out modulo the unique cyclotomic polynomial such that . Thus, if we define:we may write:or equivalently

For N an odd prime p, all nonzero values of are coprime with p so that the -point DFT may be calculated as follows:

 (1) form the polynomialsfor ; (2) evaluate for ; (3) put ; (4) calculate the terms for separately by

Step (1) is a set of p polynomial transforms' involving no multiplications; step (2) consists of p DFTs on p points each since ifthenstep (3) is a permutation; and step (4) is a p-point DFT. Thus the 2D DFT on points, which takes 2p p-point DFTs by the row–column method, involves only p-point DFTs; the other DFTs have been replaced by polynomial transforms involving only additions.

This procedure can be extended to n dimensions, and reduces the number of 1D p-point DFTs from for the row–column method to , at the cost of introducing extra additions in the polynomial transforms.

A similar algorithm has been formulated by Auslander et al. (1983) in terms of Galois theory.

### References

Auslander, L., Feig, E. & Winograd, S. (1983). New algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 31, 388–403.
Auslander, L., Tolimieri, R. & Winograd, S. (1982). Hecke's theorem in quadratic reciprocity, finite nilpotent groups and the Cooley–Tukey algorithm. Adv. Math. 43, 122–172.
Guessoum, A. & Mersereau, R. M. (1986). Fast algorithms for the multidimensional discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 34, 937–943.
Harris, D. B., McClellan, J. H., Chan, D. S. K. & Schuessler, H. W. (1977). Vector radix fast Fourier transform. Rec. 1977 IEEE Internat. Conf. Acoust. Speech Signal Process. pp. 548–551.
Mersereau, R. & Speake, T. C. (1981). A unified treatment of Cooley–Tukey algorithms for the evaluation of the multidimensional DFT. IEEE Trans. Acoust. Speech Signal Process. 29, 1011–1018.
Mersereau, R. M. (1979). The processing of hexagonally sampled two-dimensional signals. Proc. IEEE, 67, 930–949.
Morris, R. L. (1978). A comparative study of time efficient FFT and WFTA programs for general purpose computers. IEEE Trans. Acoust. Speech Signal Process. 26, 141–150.
Nussbaumer, H. J. & Quandalle, P. (1979). Fast computation of discrete Fourier transforms using polynomial transforms. IEEE Trans. Acoust. Speech Signal Process. 27, 169–181.
Rivard, G. E. (1977). Direct fast Fourier transform of bivariate functions. IEEE Trans. Acoust. Speech Signal Process. 25, 250–252.