Tables for
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 1.3, pp. 52-54   | 1 | 2 |

Section The Cooley–Tukey algorithm

G. Bricognea

aGlobal Phasing Ltd, Sheraton House, Suites 14–16, Castle Park, Cambridge CB3 0AX, England, and LURE, Bâtiment 209D, Université Paris-Sud, 91405 Orsay, France The Cooley–Tukey algorithm

| top | pdf |

The presentation of Gentleman & Sande (1966)[link] will be followed first [see also Cochran et al. (1967)[link]]. It will then be reinterpreted in geometric terms which will prepare the way for the treatment of multidimensional transforms in Section[link]

Suppose that the number of sample points N is composite, say [N = N_{1} N_{2}]. We may write k to the base [N_{1}] and [k^{*}] to the base [N_{2}] as follows:[\eqalign{k &= k_{1} + N_{1} k_{2} \quad\, k_{1} \in {\bb Z}/N_{1} {\bb Z}, \quad k_{2} \in {\bb Z}/N_{2} {\bb Z} \cr k^{*} &= k_{2}^{*} + k_{1}^{*} N_{2} \quad\,k_{1}^{*} \in {\bb Z}/N_{1} {\bb Z}, \quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}.}]The defining relation for [\bar{F}(N)] may then be written:[\eqalign{X^{*} (k_{2}^{*} + k_{1}^{*} N_{2}) &= {\sum\limits_{k_{1} \in {\bb Z}/N_{1} {\bb Z}}}\, {\sum\limits_{k_{2} \in {\bb Z}/N_{2} {\bb Z}}} X (k_{1} + N_{1} k_{2}) \cr &\quad \times e \left[{(k_{2}^{*} + k_{1}^{*} N_{2}) (k_{1} + N_{1} k_{2}) \over N_{1} N_{2}}\right].}]The argument of [e[.]] may be expanded as[{k_{2}^{*} k_{1} \over N} + {k_{1}^{*} k_{1} \over N_{1}} + {k_{2}^{*} k_{2} \over N_{2}} + k_{1}^{*} k_{2},]and the last summand, being an integer, may be dropped:[\eqalign{&X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})\cr &\quad = {\sum\limits_{k_{1}}} \left\{e \left({k_{2}^{*} k_{1} \over N}\right) \left[{\sum\limits_{k_{2}}}\, X (k_{1} + N_{1} k_{2}) e \left({k_{2}^{*} k_{2} \over N_{2}}\right)\right]\right\}\cr &\qquad \times e \left({k_{1}^{*} k_{1} \over N_{1}}\right).}]This computation may be decomposed into five stages, as follows:

  • (i) form the [N_{1}] vectors [{\bf Y}_{k_{1}}] of length [N_{2}] by the prescription[Y_{k_{1}} (k_{2}) = X (k_{1} + N_{1} k_{2}),\quad k_{1} \in {\bb Z}/N_{1} {\bb Z},\quad k_{2} \in {\bb Z}/N_{2} {\bb Z}\hbox{\semi}]

  • (ii) calculate the [N_{1}] transforms [{\bf Y}_{k_{1}}^{*}] on [N_{2}] points:[{\bf Y}_{k_{1}}^{*} = \bar{F} (N_{2}) [{\bf Y}_{k_{1}}],\quad k_{1} \in {\bb Z}/N_{1} {\bb Z}\hbox{\semi}]

  • (iii) form the [N_{2}] vectors [{\bf Z}_{k_{2}^{*}}] of length [N_{1}] by the prescription[{\bf Z}_{k_{2}^{*}} (k_{1}) = e \left({k_{2}^{*} k_{1} \over N}\right) Y_{k_{1}}^{*} (k_{2}^{*}),\quad k_{1} \in {\bb Z}/N_{1} {\bb Z},\quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}\hbox{\semi}]

  • (iv) calculate the [N_{2}] transforms [{\bf Z}_{k_{2}^{*}}^{*}] on [N_{1}] points:[{\bf Z}_{k_{2}^{*}}^{*} = \bar{F} (N_{1}) [{\bf Z}_{k_{2}^{*}}],\quad k_{2}^{*} \in {\bb Z}/N_{2} {\bb Z}\hbox{\semi}]

  • (v) collect [X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})] as [Z_{k_{2}^{*}}^{*} (k_{1}^{*})].

If the intermediate transforms in stages (ii)[link] and (iv)[link] are performed in place, i.e. with the results overwriting the data, then at stage (v)[link] the result [X^{*} (k_{2}^{*} + k_{1}^{*} N_{2})] will be found at address [k_{1}^{*} + N_{1} k_{2}^{*}]. This phenomenon is called scrambling by `digit reversal', and stage (v)[link] is accordingly known as unscrambling.

The initial N-point transform [\bar{F} (N)] has thus been performed as [N_{1}] transforms [\bar{F} (N_{2})] on [N_{2}] points, followed by [N_{2}] transforms [\bar{F} (N_{1})] on [N_{1}] points, thereby reducing the arithmetic cost from [(N_{1} N_{2})^{2}] to [N_{1} N_{2} (N_{1} + N_{2})]. The phase shifts applied at stage (iii)[link] are traditionally called `twiddle factors', and the transposition between [k_{1}] and [k_{2}^{*}] can be performed by the fast recursive technique of Eklundh (1972)[link]. Clearly, this procedure can be applied recursively if [N_{1}] and [N_{2}] are themselves composite, leading to an overall arithmetic cost of order N log N if N has no large prime factors.

The Cooley–Tukey factorization may also be derived from a geometric rather than arithmetic argument. The decomposition [k = k_{1} + N_{1} k_{2}] is associated to a geometric partition of the residual lattice [{\bb Z}/N {\bb Z}] into [N_{1}] copies of [{\bb Z}/N_{2} {\bb Z}], each translated by [k_{1} \in {\bb Z}/N_{1} {\bb Z}] and `blown up' by a factor [N_{1}]. This partition in turn induces a (direct sum) decomposition of X as[{\bf X} = {\textstyle\sum\limits_{k_{1}}}\, {\bf X}_{k_{1}},]where[\eqalign{X_{k_{1}} (k) &= X (k)\quad \hbox{if } k \equiv k_{1} \hbox{ mod } N_{1},\cr &= 0{\hbox to 23pt{}} \hbox{otherwise}.}]

According to (i)[link], [{\bf X}_{k_{1}}] is related to [{\bf Y}_{k_{1}}] by decimation by [N_{1}] and offset by [k_{1}]. By Section[link], [\bar{F} (N) [{\bf X}_{k_{1}}]] is related to [\bar{F} (N_{2}) [{\bf Y}_{k_{1}}]] by periodization by [N_{2}] and phase shift by [e (k^{*} k_{1}/N)], so that[X^{*} (k^{*}) = {\sum\limits_{k_{1}}} e \left({k^{*} k_{1} \over N}\right) Y_{k_{1}}^{*} (k_{2}^{*}),]the periodization by [N_{2}] being reflected by the fact that [Y_{k_{1}}^{*}] does not depend on [k_{1}^{*}]. Writing [k^{*} = k_{2}^{*} + k_{1}^{*} N_{2}] and expanding [k^{*} k_{1}] shows that the phase shift contains both the twiddle factor [e (k_{2}^{*} k_{1}/N)] and the kernel [e (k_{1}^{*} k_{1}/N_{1})] of [\bar{F} (N_{1})]. The Cooley–Tukey algorithm is thus naturally associated to the coset decomposition of a lattice modulo a sublattice (Section[link]).

It is readily seen that essentially the same factorization can be obtained for [F(N)], up to the complex conjugation of the twiddle factors. The normalizing constant [1/N] arises from the normalizing constants [1/N_{1}] and [1/N_{2}] in [F (N_{1})] and [F (N_{2})], respectively.

Factors of 2 are particularly simple to deal with and give rise to a characteristic computational structure called a `butterfly loop'. If [N = 2M], then two options exist:

  • (a) using [N_{1} = 2] and [N_{2} = M] leads to collecting the even-numbered coordinates of X into [{\bf Y}_{0}] and the odd-numbered coordinates into [{\bf Y}_{1}][\eqalign{Y_{0} (k_{2}) &= X (2k_{2}),\quad \qquad k_{2} = 0, \ldots, M - 1,\cr Y_{1} (k_{2}) &= X (2k_{2} + 1),\quad \,k_{2} = 0, \ldots, M - 1,}]and writing:[\eqalign{X^{*} (k_{2}^{*}) = \,&Y_{0}^{*} (k_{2}^{*}) + e (k_{2}^{*}/N) Y_{1}^{*} (k_{2}^{*}),\cr \quad &k_{2}^{*} = 0, \ldots, M - 1\hbox{\semi}\cr X^{*} (k_{2}^{*} + M) =\, &Y_{0}^{*} (k_{2}^{*}) - e (k_{2}^{*}/N) Y_{1}^{*} (k_{2}^{*}),\cr &k_{2}^{*} = 0, \ldots, M - 1.}]This is the original version of Cooley & Tukey, and the process of formation of [{\bf Y}_{0}] and [{\bf Y}_{1}] is referred to as `decimation in time' (i.e. decimation along the data index k).

  • (b) using [N_{1} = M] and [N_{2} = 2] leads to forming[\eqalign{Z_{0} (k_{1}) &= X (k_{1}) + X (k_{1} + M),\,\,\quad \quad \quad \qquad k_{1} = 0, \ldots, M - 1,\cr Z_{1} (k_{1}) &= [X (k_{1}) - X (k_{1} + M)] e \left({k_{1} \over N}\right),{\hbox to 20pt{}} k_{1} = 0, \ldots, M - 1,}]then obtaining separately the even-numbered and odd-numbered components of [{\bf X}^{*}] by transforming [{\bf Z}_{0}] and [{\bf Z}_{1}]:[\eqalign{X^{*} (2k_{1}^{*}) &= Z_{0}^{*} (k_{1}^{*}),\quad k_{1}^{*} = 0, \ldots, M - 1\hbox{\semi}\cr X^{*} (2k_{1}^{*} + 1) &= Z_{1}^{*} (k_{1}^{*}),\quad k_{1}^{*} = 0, \ldots, M - 1.}]This version is due to Sande (Gentleman & Sande, 1966[link]), and the process of separately obtaining even-numbered and odd-numbered results has led to its being referred to as `decimation in frequency' (i.e. decimation along the result index [k^{*}]).

By repeated factoring of the number N of sample points, the calculation of [F(N)] and [\bar{F} (N)] can be reduced to a succession of stages, the smallest of which operate on single prime factors of N. The reader is referred to Gentleman & Sande (1966)[link] for a particularly lucid analysis of the programming considerations which help implement this factorization efficiently; see also Singleton (1969)[link]. Powers of two are often grouped together into factors of 4 or 8, which are advantageous in that they require fewer complex multiplications than the repeated use of factors of 2. In this approach, large prime factors P are detrimental, since they require a full [P^{2}]-size computation according to the defining formula.


Cochran, W. T., Cooley, J. W., Favin, D. L., Helms, H. D., Kaenel, R. A., Lang, W. W., Maling, G. C., Nelson, D. E., Rader, C. M. & Welch, P. D. (1967). What is the fast Fourier transform? IEEE Trans. Audio, 15, 45–55.
Eklundh, J. O. (1972). A fast computer method for matrix transposing. IEEE Trans. C-21, 801–803.
Gentleman, W. M. & Sande, G. (1966). Fast Fourier transforms – for fun and profit. In AFIPS Proc. 1966 Fall Joint Computer Conference, pp. 563–578. Washington: Spartan Books.
Singleton, R. C. (1969). An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. AU, 17, 93–103.

to end of page
to top of page