International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 3.5, pp. 474-480

## Section 3.5.4. Computational efficiency

T. A. Dardena*

aLaboratory of Structural Biology, National Institute of Environmental Health Sciences, 111 T. W. Alexander Drive, Research Triangle Park, NC 27709, USA
Correspondence e-mail: darden@gamera.niehs.nih.gov

### 3.5.4. Computational efficiency

| top | pdf |

The Ewald-sum methods discussed above lead to calculations that are much faster than straightforward summation, as well as clarifying the nature of the conditional convergence. However, there are opportunities for further significant speed-ups, which we cover in this section. The computational cost of the Ewald sum is dominated by the direct and reciprocal sum. We cover these in turn.

#### 3.5.4.1. The direct sum

| top | pdf |

For all the cases discussed above – point charge, normalized spherical Gaussians and the more complex densities made up of sums over a basis of normalized Hermite Gaussians – the Ewald direct sum involves terms like and in the Hermite case partial derivatives with respect to the x, y and z components of these terms. Given the efficiency opportunities in the reciprocal sum to be discussed below, it is advantageous to choose an Ewald parameter (referred to above as α in the point-charge case or β for the compact–compact Gaussian or Hermite Gaussian interactions) so that direct-sum terms can be neglected past a fixed cutoff, i.e. when . Given a fixed Ewald parameter α, the cutoff may be fixed independently of the unit-cell size. At constant or near constant material density (number of atoms per cubic ångstrom) this leads to `linear scaling', i.e. the computational cost of the Ewald direct sum grows linearly with the unit-cell volume or number of atoms in the unit cell U.

For the point-charge or normalized spherical Gaussian case, the remaining issue is efficient evaluation of the erfc function. Various table lookup schemes have been advanced, as well as a polynomial expression (Allen & Tildesley, 1987). Ewald summation for ideal multipoles up to quadrupole have been discussed by Smith (1982, 1998) and Aguado & Madden (2003), using optimized combinations of terms to minimize the cost of the direct sum. A more general scheme, covering higher-order multipoles, was proposed by Sagui et al. (2004). In the latter scheme the McMurchie–Davidson recursion (McMurchie & Davidson, 1978; Helgaker et al., 2000) was used to calculate the higher-order derivatives of for the multipole–multipole interaction tensor. This same approach was used by Cisneros et al. (2006) to calculate the interaction tensor for normalized Hermite Gaussian interactions. Although originally formulated for Hermite Gaussian interactions, it applies to more general partial derivative calculations. Let r = (x, y, z) be a point in with , and let g(r) be a smooth function of r. For example, . We are interested in the partial derivatives with respect to x, y and z of g(r). Let R(0, 0, 0, 0) = g(r), and for n > 0 let R(0, 0, 0, n) = (1/r)(d/dr)R(0, 0, 0, n − 1). For non-negative integers t, u, v, let = . Then we have the following recursion:

The proof of the last recursion depends on the fact that and on Leibniz' generalization of the product rule for differentiation, namely , and is as follows:The other two recursions are proved similarly. To apply this to the case we recall from equation (3.5.3.4) that and that for the higher-order Boy's functions , . Efficient evaluation of the Boy's functions is discussed by Helgaker et al. (2000).

Further economies for point multipoles can be achieved by using symmetry. For example, it is only necessary to keep track of six quadrupolar terms, with a similar reduction for Hermite Gaussians. A general discussion, including rotation of multipoles or Hermite Gaussians from local molecule frame to laboratory frame, is provided in Sagui et al. (2004) as well as in Cisneros et al. (2006).

#### 3.5.4.2. Reciprocal-sum speed-ups: the smooth particle mesh Ewald and fast Fourier Poisson methods

| top | pdf |

The smooth particle mesh Ewald (PME) (Essmann et al., 1995) and fast Fourier Poisson (FFP) (York & Yang, 1994) methods both rely on the three-dimensional fast Fourier transform (3DFFT) for computational efficiency. In combination with the above use of cutoffs in the direct sum, they lower the computational cost of the Ewald sum from (Perram et al., 1988) to , where N is the number of particles and e.g. denotes a quantity such that remains bounded as . For the point-charge case, the key insight is that if the point charges happen to be located on a regular grid, the structure factors [equation (3.5.2.14)] become discrete Fourier transforms and can be rapidly calculated using FFTs.

Like other variants of the original particle–particle particle–mesh (P3M) method (Hockney & Eastwood, 1981), the smooth PME method interpolates charge onto grid positions. The smooth PME, like the original PME algorithm (Darden et al., 1993), formulates this problem as one of interpolation of the complex exponentials appearing in the structure factor. This interpolation process is very accurate at low frequency, i.e. for small , and becomes progressively less accurate as grows. However, the weight function multiplying the structure factor in the reciprocal sum severely damps the high-frequency (large ) terms, thus preserving overall accuracy. In contrast, the FFP method relies on the fact that is the Fourier transform of a superposition of normalized spherical Gaussian charge distributions , centred at , as discussed above. This Fourier transform is approximated by the FFT of the gridded charge density obtained by direct sampling of the Gaussians onto the grid (similar to the use of the FFT in macromolecular X-ray crystallography). Note that the smooth PME method has been easily extended to the treatment of London dispersion interactions, since these involve similar structure factors with a different weight function . The extension is not so straightforward with the FFP approach. However, as discussed below, the FFP approach has advantages in the case of continuous charge density when more than one Gaussian exponent is involved.

For the case of point charges, P3M methods based on least-squares approximation of the exact Ewald reciprocal sum gradient (Pollock & Glosli, 1996; Darden et al., 1997) can be made more efficient than the smooth PME method at the same accuracy, despite requiring more FFTs (Deserno & Holm, 1998). However, for the case of ideal multipoles or normalized Hermite Gaussians, the number of such FFTs becomes prohibitive, whereas the smooth PME, which relies on analytic differentiation of B-splines, does not need more FFTs as the order of differentiation (t + u + v) in equation (3.5.3.12) grows.

#### 3.5.4.2.1. The smooth PME approach

| top | pdf |

We begin by re-expressing the complex exponentials appearing in the structure factors in equations (3.5.2.14), (3.5.2.26) and (3.5.3.12). Recall the form of m from equation (3.5.2.2), and let be positive integers of the scale of the unit-cell dimensions. Then for we can writewhere and , , and thus

To motivate the smooth PME approach, suppose we can approximate by some function of the formwhere is a well behaved function with continuous derivatives as needed for the general structure factor in equation (3.5.3.12). We first examine the structure factor for point-charge Ewald sums, defined in equation (3.5.2.14). This can be approximated aswhere is given byand we have used the fact that = for .

Next we turn to the general structure factors from equation (3.5.3.12). Define , as above for expansion points , and note thatwhere are constants depending on the reciprocal bases and on , . Similar expansions hold for the other partial derivatives. Thus we can approximate the general structure factor bywhere are the transformed Hermite coefficients obtained from by change of variables, using and where is given by

In equation (3.5.4.2) we recognize the approximation to as times the three-dimensional discrete Fourier transform of , which can be calculated rapidly by the 3DFFT. If the function B(w) has finite support then can be calculated rapidly [in order(N) time] as well. Similarly we have the 3DFFT of in equation (3.5.4.4). Thus the efficient calculation of structure factors depends on the approximation given in equation (3.5.4.1), which in turn rests on the choice of B(w). The smooth PME relies on B-splines.

#### 3.5.4.2.1.1. Properties of B-splines

| top | pdf |

We proceed by deriving the facts needed about B-splines. The B-splines of order n, Bn(w) are functions of a real variable defined as follows:

We use the following properties of the B-splines:

 (1) for and all w, and for or ; (2) for and all w, ; (3) for , ; (4) for , , ; (5) for , , + .

The first property to be established is clearly true for n = 1 and if true for n, then in the above integral recursively defining Bn+1 if or the integrand must be zero for , so . Thus it is true in general by mathematical induction.

The second property is also clearly true for n = 1. Assume it is true for n. Then = = . Thus it is true in general by mathematical induction.

Next we show symmetry: . Again this is clearly true for n = 1. Assuming it is true for n, we havewhere we substituted in the second line above.

The fourth property is an efficient recursion for the derivatives of B-splines. It can be seen directly for n = 2 for . Assume it is true for . Then for ,thus it is true in general by induction.

The final property is an efficient recursion for . It can be seen directly for n = 2. Assume it is true for . Then for ,where we have used property (4) in the third line and integration by parts in the fifth and sixth lines. The induction step follows by moving the last term on the right-hand side over to the left side of the equation.

#### 3.5.4.2.1.2. B-spline approximation to trigonometric functions

| top | pdf |

Next we examine properties of the B-spline approximations to the complex exponentials that appear in the structure factors. Let be given byThe complex number is chosen to optimize the approximation of by as shown below. It is convenient to first study the properties of the function given bywhich is an approximation to the constant one. From its definition it is clear that is periodic in w with period one, i.e. . Therefore it can be expanded in a Fourier series:The Fourier coefficients are given bywhere denotes the Fourier transform of . Since are defined recursively by convolutions, its Fourier transform is obtained simply from that of : . Here we are using the fact that if f, g and h are related by convolution, , then . Since can be obtained by straightforward integration, we have

The original smooth PME method set , where was chosen to enforce interpolation: or for integer values of w. This was related to the so-called Euler spline, studied for more general complex-valued interpolation problems by Schoenberg (1973). The special case of interpolating the function is simpler, due to the Fourier-series representation, allowing us to to derive necessary properties in a self-contained way. From equations (3.5.4.6) and the above expression for , noting that for integral z with , the interpolation condition can be written

Note that except for , the expressions appearing in the interpolation condition are periodic in z with period one. The interpolation condition thus forces to be periodic as well, and so we focus attention on the range . If the sum in the above interpolation condition is bounded, so in order to be able to choose to satisfy the interpolation condition it is sufficient to demonstrate that for . If n is even, the terms in this sum are all positive so it is always possible to interpolate. If n is odd and z = ½ then the sum is zero, since , etc. Similarly the interpolation condition fails for z = −½. However, for , , etc. Similarly for . Thus we can say that for and the interpolation condition holds, and in this case we can write aswhereThe interpolating is obtained from this by multiplying by . Numerically, it is more convenient to use the alternative expression from equation (3.5.4.5), with given byNote that due to the above results for , if and .

Subsequent to the work by Essmann et al. (1995), we found (Darden et al., 1997) that a more accurate approximation was given by least-squares fitting of by , or equivalently 1 by . Thus, with given by equation (3.5.4.7) we are looking for a complex number that minimizes the mean-square error MSE, given bywhere and denote the real and imaginary parts of . Since , it is clear that we can set to zero, that is, restrict ourselves to real . Then, minimizing over real we havewith given in equation (3.5.4.8).

Thus we arrive at the least-squares optimal approximation to by setting in equation (3.5.4.5). The mean-square error due to this can be seen to beand thus the error in the approximation is seen to be small for small but increasing as , as discussed above (the reciprocal Ewald weight term compensates for the increased error by rapidly damping the larger terms). For the error decreases to zero as the B-spline order . The accuracy of the PME is thus increased by raising the order of interpolation and/or the grid size , which lowers for a given m. When derivatives with respect to w of are considered in this latter representation, their effect is to effectively lower the order of interpolation in the coefficients . Thus the order of interpolation needs to be higher, the higher the order of differentiation considered.

#### 3.5.4.2.2. The fast Fourier Poisson approach

| top | pdf |

The fast Fourier Poisson (FFP) method (York & Yang, 1994) accelerates the calculation of the structure factors in equations (3.5.2.14) and (3.5.3.12) in much the same way as FFT methods accelerate structure-factor and density-map calculations in macromolecular structure determination. Originally the method was used to calculate structure factors appearing in Ewald sums over point charges. It is easier to see the basic idea if we rewrite the Ewald reciprocal sum from equation (3.5.2.15) as follows:where the modified structure factor is given by

We recognize as the three-dimensional Fourier transform, evaluated at m, of a density given bywhere is a normalized Gaussian with Gaussian exponent 2α.

In turn we approximate this integral by a discrete Fourier transform. As above let be positive integers of the scale of the unit-cell dimensions. Let , where V is the volume of the unit cell. Recall the lattice basis vectors , . Given integers define by . Then we can approximate the modified structure factor bywhere is given by

The FFP can be generalized to Hermite Gaussians by sampling that density in place of the superposition of spherical Gaussians. In equation (3.5.3.11) the termcan be handled by the methods just outlined. However the termcan present problems. The difficulty comes in the case when, for example, is compact but is not. Then the combined Gaussian weights are sufficiently damping to efficiently converge the series, but it can be very inefficient to directly sample the superposition of Hermite Gaussians with a compact exponent. A similar problem was faced by Fusti-Molnar & Pulay (2002). Their solution was to filter out high frequencies in the Fourier-transformed compact densities and back transform them to obtain diffuse real-space densities. Instead, Cisneros et al. (2006) adjusted the exponents of both the compact and diffuse Gaussians. Choose a such that if is diffuse . For example, choose . As before, define by , and byThen, if and ,Thus instead of approximating by sampling the Gaussian followed by applying the 3DFFT, they approximated or . Note that with the above choice of , for any , and for any , . That is, all the modified Gaussians can be sampled on a uniform grid. Finally, note that for the diffuse–diffuse interactionsi.e. the interaction between adjusted diffuse Gaussians can be corrected efficiently in reciprocal space.

To test the efficiency of these reciprocal-space approaches, the electrostatic energy and intermolecular forces were calculated for a series of water boxes (64, 128, 256, 512 and 1024 water molecules) under periodic boundary conditions. Similar calculations were performed by Cisneros et al. (2006). As in that paper, the electron density of the water molecule was modelled by a five-site expansion consisting of the oxygen, the two hydrogens and the two bond midpoints, and using standard auxiliary basis functions, denoted A2 and P1, from the density-functional literature. These basis functions were obtained from the Environmental Molecular Sciences Laboratory at Pacific Northwest Laboratories (http://www.emsl.pnl.gov/forms/basisform.html ). The A2 basis set expansion uses 21 Hermite primitives with 102 Hermite coefficients per water molecule, resulting in a root-mean-square relative molecular force error of about 8% compared to 6–31G* B3LYP intermolecular electrostatics (Cisneros et al., 2006). The more complex P1 basis set expansion uses 38 Hermite primitives with 200 Hermite coefficients per water molecule, reducing the relative molecular force error to about 2%. For each water box the electrostatic energy and forces for the model A2 or P1 densities were calculated to a relative root-mean-square molecular force error of 10−3 or 10−4, compared to exact Ewald summation of the respective model density.

The time to perform these calculations is given in Table 3.5.4.1. Note that the times given here are considerably faster than those reported in Figs. 13 and 14 of Cisneros et al. (2006). This is partly due to the faster processor used here (3.6 GHz Intel Xeon versus 3.2 GHz Xeon used previously; Intel compiler with similar options used in both cases) but mainly due to the more efficient algorithms described herein. Note that the PME and FFP timings increase in an approximately linear fashion as a function of box size whereas the Ewald sum timings behave quadratically. For each method the timings are a continuous function of the compact–diffuse separating exponent 2β described above in Section 3.5.3.1, except when 2β equals a Gaussian exponent in the model basis set. When this happens, a set of Hermite primitives change from compact to diffuse or vice versa and the relative balance of direct-sum or reciprocal-sum calculation changes discontinuously. For the smooth PME calculations it was optimal to choose the separating exponent at one of these boundaries. This separating exponent value was then retained for the FFP and Ewald calculations, although the FFP method could be made slightly more efficient by further varying this parameter, and the Ewald summation could be considerably accelerated to an order N3/2 method, where N is the number of water molecules, by making the separating exponent a decreasing function of system size. Neither of these optimizations would change the relative efficiencies, however.

 Table 3.5.4.1| top | pdf | Timing versus accuracy for smooth PME, FFP and regular Ewald sum reciprocal-space approaches
 (a) Time in seconds for A2 model density.
Box sizeRMS force error 10−3RMS force error 10−4
PMEFFPEwaldPMEFFPEwald
64 0.106 0.142 0.365 0.144 0.274 0.520
128 0.218 0.271 1.336 0.287 0.380 1.869
256 0.387 0.528 5.239 0.517 0.846 7.256
512 0.837 1.100 17.881 1.104 1.534 25.511
1024 1.701 2.236 71.513 2.249 3.152 108.158
 (b) Time in seconds for P1 model density
Box sizeRMS force error 10−3RMS force error 10−4
PMEFFPEwaldPMEFFPEwald
64 0.310 0.399 2.321 0.478 0.688 3.858
128 0.591 0.832 7.706 0.923 1.576 11.056
256 1.186 1.920 35.178 1.805 2.736 49.107
512 2.549 3.825 119.863 3.794 5.684 183.487
1024 4.953 6.890 486.384 6.947 11.307 714.644

The error in the FFP for a given Gaussian exponent is controlled by the size of the grid as in the PME method, and also by the completeness in the Gaussian sampling (the cutoff in the tails of the Gaussian). Typically, for equivalent accuracy, the FFP requires more extensive sampling onto grid points than does the smooth PME. That is, each Gaussian centred at an expansion point must be sampled over a considerably larger set of points than the tensor product B-spline in equation (3.5.4.3). The additional cost due to this has been ameliorated somewhat by the Gaussian split Ewald approach (Shan et al., 2005). Additionally, in rectangular unit cells fast Poisson solvers such as multigrid (Beck, 2000; Sagui & Darden, 2001) can be applied, which in principle should parallelize more efficiently than the 3DFFT. Finally, although the smooth PME is typically more efficient for a single Gaussian exponent structure factor, the FFP approach has the strong advantage that the various Gaussian superpositions having possibly different exponents can be sampled and then fast Fourier transformed together, whereas they must be kept separate in the PME approach. Thus, as here Cisneros et al. (2006) found that the smooth PME approach was more efficient for smaller auxiliary basis sets, whereas they found that the FFP approach became more efficient for large auxiliary basis sets involving many different Gaussian exponents.

### References

Aguado, A. & Madden, P. A. (2003). Ewald summation of electrostatic multipole interactions up to the quadrupolar level. J. Chem. Phys. 119, 7471–7483.
Allen, M. P. & Tildesley, D. J. (1987). Computer Simulation of Liquids. Oxford: Clarendon Press.
Beck, T. L. (2000). Real-space mesh techniques in density functional theory. Rev. Mod. Phys. 72, 1041–1080.
Cisneros, G. A., Piquemal, J. P. & Darden, T. A. (2006). Generalization of the Gaussian electrostatic model: Extension to arbitrary angular momentum, distributed multipoles, and speedup with reciprocal space methods. J. Chem. Phys. 125, 184101.
Darden, T. A., Toukmaji, A. & Pedersen, L. (1997). Long-range electrostatic effects in biomolecular simulations. J. Chim. Phys. 94, 1346–1364.
Darden, T. A., York, D. M. & Pedersen, L. G. (1993). Particle mesh Ewald: An N log(N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092.
Deserno, M. & Holm, C. (1998). How to mesh up Ewald sums I: A theoretical and numerical comparison of various particle mesh routines. J. Chem. Phys. 109, 7678–7693.
Essmann, U., Perera, L., Berkowitz, M. L., Darden, T., Lee, H. & Pedersen, L. G. (1995). A smooth particle mesh Ewald method. J. Chem. Phys. 103, 8577–8593.
Fusti-Molnar, L. & Pulay, P. (2002). The Fourier transform Coulomb method: Efficient and accurate calculation of the Coulomb operator in a Gaussian basis. J. Chem. Phys. 117, 7827–7835.
Helgaker, T., Jorgensen, P. & Olsen, J. (2000). Molecular Electronic-Structure Theory. Chichester: John Wiley and Sons.
Hockney, R. W. & Eastwood, J. W. (1981). Computer Simulation Using Particles. New York: McGraw-Hill.
McMurchie, L. E. & Davidson, E. R. (1978). One-electron and 2-electron integrals over Cartesian Gaussian functions. J. Comput. Phys. 26, 218–231.
Perram, J. W., Petersen, H. G. & DeLeeuw, S. W. (1988). An algorithm for the simulation of condensed matter that grows as the 3/2 power of the number of particles. Mol. Phys. 65, 875–893.
Pollock, E. L. & Glosli, J. (1996). Comments on p(3)m, fmm and the Ewald method for large periodic Coulombic systems. Comput. Phys. Comm. 95, 93–110.
Sagui, C. & Darden, T. (2001). Multigrid methods for classical molecular dynamics simulations of biomolecules. J. Chem. Phys. 114, 6578–6591.
Sagui, C., Pedersen, L. G. & Darden, T. A. (2004). Towards an accurate representation of electrostatics in classical force fields: Efficient implementation of multipolar interactions in biomolecular simulations. J. Chem. Phys. 120, 73–87.
Schoenberg, I. J. (1973). Cardinal Spline Interpolation. Philadelphia: Society for Industrial and Applied Mathematics.
Shan, Y. B., Klepeis, J. L., Eastwood, M. P., Dror, R. O. & Shaw, D. E. (2005). Gaussian split Ewald: A fast Ewald mesh method for molecular simulations. J. Chem. Phys. 122, 054101.
Smith, W. (1982). Point multipoles in the Ewald summation. CCP5 Inf. Q. 4, 13–25.
Smith, W. (1998). Point multipoles in the Ewald summation (revisited). CCP5 Inf. Q. 46, 18–30.
York, D. & Yang, W. (1994). The fast Fourier Poisson (FFP) method for calculating Ewald sums. J. Chem. Phys. 101, 3298–3300.