A.
Shayeghi
*a,
D.
Götz
b,
J. B. A.
Davis
c,
R.
Schäfer
a and
R. L.
Johnston
*c
aEduard-Zintl-Institut, Technische Universität Darmstadt, Alarich-Weiss-Straße 8, 64287 Darmstadt, Germany. E-mail: shayeghi@cluster.pc.chemie.tu-darmstadt.de
bErnst-Berl-Institut, Technische Universität Darmstadt, Alarich-Weiss-Straße 8, 64287 Darmstadt, Germany
cSchool of Chemistry, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK. E-mail: r.l.johnston@bham.ac.uk
First published on 1st December 2014
The Birmingham cluster genetic algorithm is a package that performs global optimisations for homo- and bimetallic clusters based on either first principles methods or empirical potentials. Here, we present a new parallel implementation of the code which employs a pool strategy in order to eliminate sequential steps and significantly improve performance. The new approach meets all requirements of an evolutionary algorithm and contains the main features of the previous implementation. The performance of the pool genetic algorithm is tested using the Gupta potential for the global optimisation of the Au10Pd10 cluster, which demonstrates the high efficiency of the method. The new implementation is also used for the global optimisation of the Au10 and Au20 clusters directly at the density functional theory level.
A detailed structural characterisation of this transition regime is therefore of high interest in order to rationalise the exceptional characteristics of nanoscale materials. Generating geometric structure candidates for a comparison with experimental observations is laborious for large systems and eventually becomes infeasible. From a theoretical view it is useful to carry out a global optimisation of the potential energy surface (PES) as a function of all coordinates, while the level of theory needed has to adequately represent the system being studied.
Since the electronic structure of large nanoparticles is expected to resemble the bulk phase, tailored model or empirical potentials (EPs) such as Gupta,2 Sutton–Chen,3 and Murrell–Mottram,4 fitted to properties of the solid phase, enable a reasonable description of the PES. For smaller nanoparticles, i.e. nanoclusters, a quantum chemical treatment becomes necessary for which the computational costs are greater than in the case of using EPs. But unbiased global optimisation at this higher level of theory therefore requires the development of an efficient algorithm.
Nanoalloys (nanoparticles composed of more than one metal) are of considerable interest for their catalytic, optical and magnetic properties.5 Their global optimisation is further complicated by the presence of a large number of homotops – inequivalent permutational isomers.6,8 For this reason, the strategy was developed of optimising selected structures with DFT after searching by means of atomistic models using the second-moment approximation to the tight-binding model (SMATB).7 Evolutionary algorithms such as the Lamarckian Birmingham cluster genetic algorithm (BCGA),9 which combines local minimisation with a genetic algorithm (GA), are useful tools for searching the conformational space for the global minimum (GM) structure and lowest-energy local minima, especially when combined with first principles methods in the density functional theory (DFT) based BCGA approach.10 This procedure notably enables the theoretical investigation of elaborate mono- and bimetallic clusters using a GA with results consistent with experiments.11–16 For details on global optimisation algorithms, especially focused on genetic algorithms and basin hopping techniques, the reader is referred to the literature.17,18
The first use of GAs for global geometry optimisation of molecular clusters was reported by Hartke,19 and Xiao and Williams,20 using binary encoded geometries and bitwise acting genetic operators on binary strings.21–23 Later a GA approach that operated on cartesian coordinates of the atoms was introduced by Zeiri,24 which removed the requirement for encoding and decoding binary genes.9 This was followed by the development of GAs for cluster optimisation by Deaven and Ho,25 who performed gradient driven local minimisations for newly generated cluster structures. Further, Doye and Wales established how local minimisations effectively transform a multidimensional PES into a staircase-like surface, where the steps represent basins of attraction.26 This coarse-grained representation of the PES reduces the conformational space and therefore simplifies the PES that the GA has to search. The local minimisations generally correspond to a Lamarckian evolution, since individuals pass on a proportion of their characteristics to their offspring. This procedure has been found to improve the efficiency of global optimisations and is implemented within the BCGA program, following the approach of Zeiri using real-valued cartesian coordinates.9,24 Recent GA implementations are the OGOLEM code for arbitrary mixtures of flexible molecules of Dieterich and Hartke,27 the hybrid ab initio genetic algorithm (HAGA), for surface and gas-phase structures,28,29 and the gradient embedded genetic algorithm program (GEGA) for the global optimisation of mixed clusters formed by molecules and atoms.30,31 Very recently the surface BCGA (S-BCGA)32 and the first principles based GA of Vilhelmsen and Hammer33 for the global optimisation of supported clusters have been reported. Also very recently the perturbation theory re-assignment extended GA for mixed-metallic clusters has proven to be very useful.34
The traditional generation based BCGA program is a sequential code where local optimisations of individuals are not independent from one-another. In fact, a limitation on treatable cluster sizes or rather the level of computational sophistication arises due to the sequentially performed geometry optimisations acting as a bottleneck.35 Newly created individuals of a given population are geometrically relaxed with respect to their total energy. The best population members, with respect to their fitness (determined by a fitness function which depends on the total energy), are then selected for mating and mutation in order to create novel structures and to form the next generation. This cycle is then repeated until the energy of the lowest-lying isomers changes by less than a specified threshold within a certain number of generations. Thus, if more than the optimum number of processors is used in a first principles based global optimisation, the overall CPU time plateaus and the cores are used inefficiently due to the imperfect parallelisation of the local optimisations. In order to improve the efficiency of this approach, the goal must be to enable the independent relaxation of several geometries at the same time as schematically shown in Fig. 1, where several GA processes simultaneously optimise geometries managed by a global database (pool). This, however, cannot be implemented efficiently within the generation-based BCGA program.
Since the DFT-BCGA code employed here makes use of a plane-wave self-consistent field (PWscf) pseudopotential approach, a benchmark calculation of a geometry optimisation for the predicted GM structure of Au20 (Td symmetry)36–38 has been performed in order to demonstrate the importance of an improved GA parallelisation to counter the imperfect DFT parallelisation. The total CPU time in these minimisations, starting from a random atom displaced version of the already optimised structure is shown in Fig. 2. The Au20 cluster was chosen for the benchmark calculations since, especially for such a large system, local optimisations lead to a slowdown in the global optimisation. The corresponding benchmark calculations indicate that the optimum number of processors should be below 100 cores (the best price-performance ratio should be for 36–64, as shown in the inset of Fig. 2) since a larger number of cores would not speed up the calculations efficiently. The total CPU time can be reduced by one order of magnitude going from 10 cores to 100 but does not improve significantly when using up to 300 cores. Benchmark calculations for a local optimisation of the Au10 cluster show the same tendency, with lower absolute CPU time, and are therefore not shown here. This indicates the importance of developing a parallelised GA code which uses several GA subprocesses performing local minimisations on an efficient and ideal number of processors (48 cores in this case) at the same time, managed by a global database (see Fig. 1).
In this work, we present a significantly improved GA implementation which incorporates the BCGA and eliminates serial bottlenecks by replacing the generation based GA approach by a flexible pool model,35 here denoted as pool-BCGA. Within this pool strategy individual subprocesses share the entire work leading to a parallelisation of the algorithm. This procedure allows several geometry optimisations to be run at the same time. The gain in speed is obvious as local optimisations are the bottlenecks in a global optimisation, especially when using ab initio methods in local relaxations. In principle, one could also think about running parallel geometry optimisation tasks in the generation based BCGA. But, several ongoing optimisations would have different time demands and therefore each generation would have to wait for the slowest population members leading to processor idle times.
The development of parallel GA implementations has previously been reported for both atomic and molecular clusters,27,33,35,40,41 Global geometry optimisation at the DFT42 or ab initio43 level is generally found to be very expensive and not suitable for larger clusters, for which global optimisation at a lower level of theory would be appropriate. This leads to the commonly found two-stage procedure of performing the global search at e.g. the force-field or semi-empirical level, followed by a DFT or ab initio refinement of the best candidates.44 In the DFT-BCGA code used in this work, the global optimisation is performed at the relatively cheap pseudopotential PWscf level, which enables larger systems to be treated at the DFT level, while the best candidates can still be refined using a higher level of theory. However, the direct GA method is easily implemented with higher level approaches such as MP2 and CC calculations. The flexible concept replaces the generation based algorithm by using a global database consisting of geometric and energetic information about a specified number of individuals. Several independent subprocesses make use of this database by applying mating and mutation operators to the pool members and form new individuals. These new individuals compete with current members of the pool and are immediately added to the pool if they are lower in energy.
We first test the method for the global optimisation of the Au10Pd10 cluster, using the Gupta potential, for an extensive statistical analysis of the new implementation. The 20-atom cluster is also interesting from a catalytic point of view,45 and offers an ideal test system, especially due to the large number of homotops N = (NAu + NPd)!/NAu!NPd! ≈ 185000 for a given geometry.8 The resulting knowledge from these investigations, in terms of mating and mutation, is further used for the DFT based global optimisation of the Au10 cluster. It represents a suitable test system for the DFT case in order to compare the efficiency of both implementations, as it has been well studied in the past.38,46,47 Finally, the parallelisation of the code is tested by carrying out the global optimisation of Au20 at the DFT level, a system previously well studied experimentally36,37 while geometries have been found by genetic algorithms38,48 and the basin-hopping approach39 based on DFT.
In the DFT calculations, the Perdew–Berke–Ernzerhof (PBE) xc functional,50 and ultrasoft pseudopotentials of the Rabe–Rappe–Kaxiras–Joannopoulos type,51 with nonlinear core corrections are employed. For the calculation of electronic energies, a kinetic energy cutoff of 40 Ry and an electronic self consistency criterion of 10−5 eV are used. The efficiency of electronic convergence for metallic states is improved using the Methfessel–Paxton smearing scheme.52 Local relaxations are performed with total energy and force convergence threshold values of 10−3 eV and 10−2 eV Å−1, respectively. All DFT calculations are performed within the Quantum Espresso (QE) package.53
In contrast to the default settings of the generation based GA, where the number of offspring grows with an increasing mutation rate, in a pool-GA calculation mutation and mating are performed with a certain probability as the pool size is kept fixed. This must be taken into account when setting the parameters in a typical pool-GA run. The offspring structures compete with the structures present in the pool according to their total energy after their local optimisation. Offspring with a better fitness (lower total energy) replace higher lying pool members. After checking for repeated optimised structures using a moments of inertia selection routine, the pool is sorted by ascending total energy. Finally, convergence is achieved when the minimum energy in the pool changes by less than a pre-defined energy difference (typically 10−3 eV) within a specified total number of optimised geometries. This ensures an elitist behaviour of the GA in combination with good diversity in the pool. If convergence is not reached, the subprocesses start a new cycle, repeating the steps described above.
When executing the pool-GA, general runtime configuration settings are read from input files before the GA initially synchronises with the global database. The GA then enters the pool convergence loop. If the convergence criterion is not reached, the GA continues with a check for the current mode (“initial-” or “pool-mode”). As mentioned above, initial-mode means that new structures are created by randomly choosing atom coordinates inside the simulation cell while the pool-mode uses either mating or mutation operators in order to form new individuals. The new structures are then locally optimised by either passing the atom coordinates to an external ab initio quantum chemistry program (e.g. QE53 or NWChem54) or one of the empirical potentials (e.g. Gupta) embedded in the code. This pool-based approach allows the code to be easily restarted if it runs out of CPU time. The user is left free to restart as many subprocesses as preferred, depending on the available computational resources. However, aborted local optimisations are not restarted. Instead, new subprocesses are initiated, starting with new geometries which are generated from the current pool configuration by the evolutionary principles mentioned above.
Fig. 4 compares the pool-GA, for different pool sizes, to a random structure search. The same mutation rate is used in all calculations, with an atom exchange mutation rate of 0.5 because of homotops, beside the cluster replacement mutation adding new random structures. By applying the atom exchange mutation operator to the replacement mutation, the GA becomes considerably more efficient.17,55 The solid lines represent averaged evolutionary progress plots from 1000 GA runs for each case. Evolutionary progress plots describe the evolution of the globally lowest-lying structure with the number of generations or optimised structures, respectively. The runs are averaged in order to test reproducibility and permit a meaningful statistical statement. Increasing the population size tends to reduce the efficiency of finding the GM. This is due to the increasing number of individuals in the pool and taking into consideration the same roulette selection scheme and parameters used in all calculations, a higher probability for selecting bad parents is to be expected when the pool size is increased. The optimum population size should be large enough to accommodate a high structural diversity, but small enough to remain largely elitist. A comparison to the generation-based GA, in the same way as mentioned above, shows the same behaviour and is therefore not depicted here. The random structure search, which in both cases acts as an internal standard, illustrates the high efficiency of both GA implementations in general and shows that a single pool-GA subprocess has a comparable efficiency to the generation-based GA. The pool-GA and the generation based approach compare well, as shown in Fig. 5, where both implementations are compared to a random structure search. Typically, the random search is not able to find the GM. Fig. 6 shows lognormal fits to probability densities of finding the GM after a certain number of optimised structures within the 1000 GA runs for several pool sizes. An additional plot, embedded in this figure, describes the linear scale up of the maximum number of optimisations needed versus the pool size. The good comparability of both GA approaches makes the pool-BCGA implementation a powerful tool for the prediction of cluster structures since many subprocesses can be run at the same time, while the convergence of the pool, using a single subprocess, compares well to the generation based code. This allows a much higher efficiency through communication of several subprocesses via the global database.
In order to test how the mutation rate influences both a single pool-GA subprocess and the generation based code, Fig. 7 shows averaged evolutionary progress plots where both GAs are compared for different mutation rates while using a population size of 10. The general trend is that mutation reduces the efficiency of finding the GM structure which means that mutation on average produces higher lying structures. While the pool-GA, shown in Fig. 7(a) rapidly loses efficiency with increasing mutation rate, the generational GA (Fig. 7(b)) is less influenced, which initially might appear as an unexpected result. It becomes clearer, however, if one considers, that in the pool implementation the population size is kept fixed. In the traditional BCGA the number of offspring is, by default, 0.8 times the generation size. The mutation rate is then multiplied by the sum of the generation size and the number of offspring. For a population size of 10 and a mutation rate of 0.2, this means 8 offspring are generated from mating and 3.6 mutants on average since (10 + 8) × 0.2 = 3.6. For the pool-GA, therefore, the efficiency seems to be lowered with increasing mutation rate due to the reduced mating rate which makes the implementation less elitist. However, the structural diversity in a given population can be increased by using a low mutation rate and, therefore, it should not be completely neglected. Again lognormal fits to probability densities of finding the GM after a certain number of optimised structures within 1000 GA runs, depending on the mutation rate, are shown in Fig. 8. The plot embedded in this figure shows an exponential scale up of the maximum number of optimisations needed versus the mutation rate. The probability densities for mutation rates larger than 0.8 could not be well fitted due to the very small efficiency of finding the GM.
The resulting structures below 0.4 eV from the predicted GM, as obtained at the pwSCF/PBE level of theory, are shown in Fig. 10. Both implementations are able to find identical local minima when optimising a comparable number of structures. The evolutionary progress plot (Fig. 11) shows an example for the pool-GA case, where the GM is found after the optimisation of about 50 structures. This number, however, varies from run to run due to the stochastic nature of the GA, which originates from constructing the initial population by producing random structures. In any case, it shows how the current best (lowest energy) solution evolves towards the planar GM isomer 10-a with D2h symmetry.
The potential lowest energy isomers below 0.4 eV, as obtained at this level of theory, including the planar GM isomer 10-a are in agreement with the previous findings of Götz et al.47 However, the trigonal prism with both triangular faces and two rectangular faces capped, suggested by Choi et al.,56 has been found to lie high in energy at this level of theory, as well as all other isomers found in these previous studies. A new planar isomer 10-g, which has been described for the Au10− cluster,57 and a 3D structure 10-e were also found to lie below 0.4 eV. Nevertheless, it should be mentioned that the relative energies obtained at this level of theory, using loose convergence criteria, should always be treated with care. A reminimisation of the structures at a higher level of theory or the use of tighter convergence conditions can unpredictably change the energetic ordering, although 10-a is expected to remain the GM.
The PES can be described by a sequence of local minima inter-connected by transition states where monotonic sequences form funnels.58 A given topology, once in a funnel, must eventually overcome several energy barriers in order to reach the GM or another specific local minimum as the PES is explored. This means that a given local optimisation within a GA optimisation task could potentially relax into a so-called metabasin with small geometrical deviation from the minimum. Therefore energetic discrepancies should not only be discussed as depending on the xc functional and pseudopotentials used, but should also be attributed to the cases where local optimisations end in metabasins near a local minimum, leading to an apparently wrong energy ordering.
However, this should not be interpreted as a problem. Genetic algorithms used in this manner can be thought of as a coarse grain filter. The idea is to reduce a large configuration space to a manageable size. The reduced configurational space then opens up the possibility of a more detailed description of only a few isomers at a higher level of theoretical complexity, often required for the description of binary clusters in combination with experiments.
The structures of the putative pool-GA GM and minima lying below 0.5 eV are shown in Fig. 12. The pool-GA successfully finds the tetrahedral structure, 20-a, as the GM. The tetrahedron is first found after the optimisation of only 56 structures. There is a large gap between the GM and the next lowest-lying structure, a distorted geometry with C1 symmetry. Structures similar to 20-b are seen in minima 20-e and 20-g, while structures 20-c, 20-f and 20-h are C1 geometries based on more subtle distortions of the tetrahedron.
By replacing the sequential working generation concept, serial bottlenecks are eliminated. A typical pool calculation can be started as a job array of several pool-GA subprocesses enabling the treatment of larger cluster sizes than previously studied or even opens up the possibility of using a higher level of theory. Alternatively, one can think about using wavefunction based methods in geometry relaxations for the global optimisation of small cluster systems as implemented in program packages such as CFOUR,59 or NWChem v6.3,54 which enable geometry optimisations based on coupled cluster methods. Such a pool implementation would emerge as the method of choice, especially in this sophisticated task of performing global optimisation using multi-electron wavefunctions to account for electron correlation with higher accuracy.
Also the very recently developed S-BCGA could be improved by using the flexible pool concept, which would allow the study of more complicated supported clusters, such as larger clusters and nanoalloys, and permit calculations at a higher level of theory.
A comparison of the results obtained by the generation- and pool-based BCGA show that the pool-GA is finally able to find all isomers predicted by the generation based implementation while both GAs give results in good agreement with existing global optimisation calculations reported in the literature.
This journal is © the Owner Societies 2015 |