Structure Solution with a Genetic Algorithm

Files needed: ga_wo3.zip

Learning Outcomes: Other tutorials discuss how symmetry adapted distortion modes can be used to describe the structure of materials that have undergone phase transitions. In many cases it can be difficult to identify the true space group of a daughter phase. In this tutorial we use a Genetic Algorithm to determine which modes in a P1 description of WO3 are actually needed to fit the diffraction data. From this analysis we determine the true space group of the material.

Pre-tutorial preparation

This tutorial requires python 2.7 and some additional packages for data analysis. The easiest way to install these is via the Enthought Canopy python 2.7 distribution available at https://store.enthought.com/downloads/#default, though other Python distributions can also be used. The xlsxwriter package (for writing to Microsoft Excel files) is also required and can be installed by typing "pip install xlsxwriter" into the Canopy command prompt (open Canopy and go to: Tools -> Canopy Command Prompt). Ensure that the Keep Directory Synced to Editor option is checked in the drop down menu in the top right of your python command window.

Extract all the files from the .zip linked above to a directory on your computer.

If you haven't updated your jEdit recently you might need to set colour coding on .inp0 files so you can view them like normal .inp files. Go to you user/.jedit/modes directory and edit the file called catalog. Append ".inp0" to the list on the "MODE NAME" line.

If you're missing macros to allow files to run try going to the macros page on the topas wiki.

Tutorial

In this tutorial we determine the RT structure of WO3, which has space group P21/n in a 2x2x2 supercell of the parent cubic structure. The principal distortions from the cubic parent are tilts of WO6 octahedra and off centering of W within the octahedra. Here we start from the position of having a parent CIF file - wo3_pm3m_parent.cif plus X-ray diffraction data (d8_03901_030c.xy) and TOF neutron data (HRP37287_bs.xye) of RT WO3. The tutorial shows how to identify the 7 most important modes and hence the true structure and space group with the only prior assumption being that the unit cell is no bigger than 2x2x2 [Kerman, S.; Campbell, B. J.; Satyavarapu, K. K.; Stokes, H. T.; Perselli, F.; Evans, J. S. Acta Crystallographica Section A: Foundations of Crystallography 2012, 68, 222.].

  1. Reduce symmetry of parent to P1
    1. Go to http://stokes.byu.edu/iso/isodistort.php
    2. Load wo3_pm3m_parent.cif as parent structure
    3. Under "types of distortion": uncheck the "strain modes" box and click "Change"
    4. Under "Space Group Preferences": check monoclinic cell choice 2 and click "Change"
    5. Under method 3: Select space group P1 from the drop down list and enter basis = {(2,0,0),(0,2,0),(0,0,2)} and click OK
    6. On the distorted structure page select the only option "1 P1, basis={(0,0,2),(0,2,0),(-2,0,0)}, origin=(0,0,0), s=8, i=384" and click OK
    7. On the distortion page check the "TOPAS.str" radio button and click OK. Save the file as wo3_pm3m_parent-p1_2x2x2.str.
    8. Look at your new .str file (e.g. in jEdit where it should be colour coded and folded). You will see 96 displacive distortion modes are available (prm a1-a96)
  2. When we use a GA to find the important modes it's best to fix parameters describing peakshape, background, etc. We can begin by confirming that a 2x2x2 supercell and 96 modes are sufficient to provide a good fit to diffraction data. We can then fix many parameters derived from this fit. To do the fit:
    1. Open wo3_pm3m_parent-p1_2x2x2.inp. It is a fit to the diffraction data using wo3_pm3m_parent-p1_2x2x2.str. Lattice parameters and other background, peakshape, thermal and instrumental parameters have been refined. All 96 distortion modes have been used. The refinement has converged with Rwp ~ 6%. You might want to confirm that the information in the "for xdds" of this file describing the structure is equivalent to that in wo3_pm3m_parent-p1_2x2x2.str prepared in the previous step.
    2. Confirm you get a good fit to the data is found using this .inp file and that the parameters have converged to a global minima. They do so repeatedly on simulated annealing. Make your own .inp file from scratch if you wish. Use the guidelines at the top of wo3_pm3m_parent-p1_2x2x2.inp to help. In Topas you might want to select "Window/one scan per window" to see both patterns. Or select "Window/All Scans in window", switch to a d-spacing view and right click on the pattern to select "Normalise yobs".
  3. Next we set up a standard format .inp0 file for use in the GA. This will contain two sections: a data section describing the data files and your predetermined instrumental parameters and corrections; and a second section containing the structural model where pre-determined parameters are included. A template file - displacive_GA_template.inp0 has been provided for you to set up your .inp0 file. Look for the "->" markers indicating where input is required from you. On this occasion all the required information has been filled in so a shortcut to this step is simply to remove all the "->" instances from the template.
    1. Rename the template file WO3_RT_GA_tutorial.inp0.
    2. Under control information, make sure the output name matches the .inp0 file name (i.e. WO3_RT_GA_tutorial.txt)
    3. Under TOF neutron data:
      1. Give the data file name and apply a weighting between the two data files by entering "TOF_XYE_JOHNWEIGHT(data/hrp37287_bs.xye, 3)". See macros section for details on the weighting
      2. Limit the data range included by entering "start_X 59250 finish_X 110000"
      3. Apply a fixed Chebyshev background polynomial by entering " bkg 0.34052782 0.12890988 -0.00189312227 -0.0184402777 -0.00818158862 -0.0124777714"
      4. Provide the experimental dummy wavelength by entering the following macro "TOF_LAM(0.001)"
      5. Add x axis calibration parameters " TOF_x_axis_calibration(!t0_hrpd_bs,-26.50646,!difc_hrpd_bs, 48346.59717,!difa_hrpd_bs,-10.89208)"
      6. Include TOF peakshape function " TOF_Exponential(!a1_bs, 202.14847,!a2_bs, 0.00005, 4, difc_hrpd_bs, +)"
      7. Under "str" give a scale factor "scale @ 0.212204986`" and an additional peakshape function " tof_sample_peakshape(!lor, 0.391158137,!dsp, 63.92319,!dspsq, 0.00000)"
    4. Under xray data information:
      1. Enter the data file name "xdd data/d8_03901_030c_corr.xy"
      2. Limit the data range included by entering "start_X 10 finish_X 75"
      3. Add a Chebyshev background polynomial function " bkg 202.072987 -267.613899 131.473243 -61.1394651 28.0301198 -13.5536605 10.3325933 -3.54909069 0.701434223"
      4. Add the monochromator angle " LP_Factor(!th2_monochromator, 0)" and experimental wavelength using a macro "CuKa2(0.0001)"
      5. Apply a height correction "Specimen_Displacement(!height, 0.16896)"
      6. Under "str" give a scale factor "scale @ 5.49105968e-005`", a Thomson-Cox-Hastings type peakshape function "TCHZ_Peak_Type(!pku,-0.17401,!pkv, 0.11466,!pkw, -0.01403,!pkz, 0.0000,!pky, 0.24069,!pkxz, 0.00010)" and a peak asymmetry parameter "Simple_Axial_Model(!axial, 8.09341)"
    5. In the for xdds section…
      1. Copy the space group and lattice parameters from your structure file wo3_pm3m_parent-p1_2x2x2.str (no need for the scale factor that you have already provided for each data set)
      2. Also copy the mode definitions, mode-amplitude to delta transformation, distorted parameters and mode-dependent sites sections
      3. Add isotropic temperature factors to your sites by replacing the current "beq 0.0" with "beq 1.43932" for W sites and "beq 1.66845" for O sites
      4. Add simulated annealing to your mode sites by adding the drand macro (see macros section) to your mode definitions. They should now read, for example "prm !a1 0.00000 min -2.83 max 2.83 drand 'Pm-3m[0,0,0]GM4-(a,b,c)[W:a:dsp] T1u(a)"
  4. You might want to check that there are no errors in your .inp0 file. Save a copy as temp.inp and run topas. You should get Rwp ~30%.
  5. We now turn to the GA itself which is controlled by the script gacontrol.py, which calls some other scripts. To set up the script:
    1. Open get_irrep_block_lists.py in Canopy. This script is responsible for identifying the modes and grouping them according to which irrep they belong to.
    2. Copy the mode definitions text from wo3_pm3m_parent-p1_2x2x2.str as "mode text". This has already been done for you in this case.
    3. Open gacontrol.py in Canopy
    4. Lines 21, 23, 24, 25 and 85 need your attention
      1. On line 21 give the number of GA repeat runs you want. For this tutorial stick with one but in reality you would want to repeat this experiment several times to confirm the algorithm is consistently finding the best solution
      2. On line 23 give the name of the folder for your results. This folder will be created in your working directory
      3. On line 24 give the directory where all the files in this tutorial (including gacontrol.py) are saved
      4. On line 25 give a list of all penalty per mode values you wish to try in the GA (comma separated). You will find that 0.1 is an effective penalty for isolating the top 7 modes. In reality you might have to sweep through several values of penalty per mode and make an assessment on when the improvement in fit to the data stops being significant (i.e compensating for peakshape effects rather than fitting genuine features). An obvious indication of this is when a large reduction in penalty per mode is required for new modes to be included
      5. On line 85 enter the directory of your TOPAS command line application tc.exe. Usually this is located on your harddisk at C:\topasa_v6\tc.exe or similar
    5. Additional Notes/Advanced Tutorial (you can skip from here to the next full number).

    6. Lines 27-31 contain additional GA controls. Each one is explained by a comment. If you find the facts window that appears while the GA runs annoying, you can turn it off here for example.
    7. Lines 42-53 contain GA parameters concerning the probability of mutation and crossover events occurring, the size of the population and the length of time the algorithm will run. Change them if you wish (you might be able to improve the speed of the GA with good choices - my record is isolating the best 7 modes within 20s!).
    8. Lines 56-57 dictate the total time the GA will run. You can reduce these if you like although you will simultaneously reduce the probability of finding the best mode for the distortion. In this case the best mode will probably be isolated very early on and the GA will end 10 minutes after that. In theory, the algorithm could fail even over this long time - although I have never seen it!
    9. Lines 99-102 give information for the fitness function under "if key=="WO3"". The fitness function used here is defined as (Rwp/fitmin)+(penalty_per_mode*mode_count). Normalising the Rwp to fitmin helps you choose an appropriate penalty_per_mode. You will notice fitmin = 6 has been entered (recall we achieved Rwp ~ 6% for all modes in step 2). You could try different functions if you wish.
  6. Run the GA.
    1. Ensure that your working directory matches your tutorial directory (this should happen automatically if you synced your directory to your editor but if not type "cd <directory_name>" into your python command prompt). Click the green "play" button just below the toolbar at the top of the canopy command prompt. Your python window at the bottom of Canopy will indicate your current working directory and the current value of penalty_per_mode. If you are following the standard tutorial instructions then you can expect to wait about 11 minutes for the GA to finish. If you are trying additional repeat runs or several penalty_per_mode values then the time will be proportionately longer.
    2. During the runs you may wish to monitor the facts window that appears. See how quickly the modepool [12,16,37,38,60,66,77] is found. Typically it is found in less than 100s. If your frustration in waiting 10 minutes to check that no lower fitness individual can be found outweighs your commitment to scientific rigour, you can click the "Stop at end of generation" button. Note that this window may be minimised by default; if so it will be a small red icon. The text in the facts window will toggle between green, amber and red as the fit improves or moves towards stagnation.
    3. You should also look at the graph that appears. The green plot shows the best fit of any member of the population. The red plot shows the worst fit of any member. The blue plot shows the average fit of all 30 members of the population. As the GA runs you should see the green line moving to lower values and the blue and red lines follow it.
    4. At the end of the run you should get a "bestfit" of ~2. This is significantly lower than the starting value of 4.95.
  7. To evaluate the results, all we need to do is look at the final modepool output by the GA. This is in the facts window, but also stored in .pic files along with other potentially important information about the GA and the model identified. Open the folder that is named "Experiment name_1". Open the .pic file. It will tell you the following: The modes active in the best model; The time the best model was obtained; The number of modes in the best model; The raw Rwp of the best model; The number of different models trialled by the GA.
  8. You can see how good the fit of the best model to the data is by running the input file WO3_RT_GA_tutorial.inp in topas. You should get Rwp=7.7%.
  9. The next step is to see what these 7 modes mean in terms of the true symmetry of RT-WO3? We can check that 7 modes gives a "perfect" fit to the data by running WO3_7_mode_fit.inp in TOPAS. Doing this creates a CIF file of our 7-mode model - WO3_7_mode_fit.cif. We can then use FINDSYM to tell us the true spacegroup symmetry.
    1. Go to http://stokes.byu.edu/iso/findsym.php
    2. Under Import Data load WO3_7_mode_fit.cif and click OK
    3. At the bottom of the page click OK to find the true symmetry of your best model using default tolerances of the lattice and atomic positions
    4. On the next page you will see your P1 7 mode model CIF. About halfway down the page you will see another CIF file with space group P21/n. This is the correct model for RT WO3. Copy and save the text as WO3_RT_P21_n.cif if you wish

Additional/Advanced Evaluation

In a real-world example where you did not know in advance precisely what modes you were looking to extract from the GA you would want to test a range of penalty per mode values. You would also want to be sure that the model identified by the GA was the true lowest fitness individual and it was identified repeatedly.

  1. Try repeating step 5 but this time set num_runs = 5 on line 21 and try a range of penalty per mode values on line 25 [1,0.8,0.6,0.4,0.3,0.2,0.1,0.08,0.06,0.04,0.02,0]. Allow about 11 hours (overnight) for the run to complete. Remember to change the experiment name on line 23 from the one you chose previously.
  2. When the run is complete you will find all the results in your experiment folder. However it would be wearisome to look through all 60 .pic files and extract the information from each. Instead you should notice a file called paste_me.xlsx. Copy the information in paste_me.xlsx into the template file results_summary.xlsx (into the spaces provided in the "raw_results tab " ta. Results appear in paste_me.xlsx in the same order as they were given in gacontrol.py - so just copy the blocks of information in the order they appear in paste_me.xslx in the exact same order into results_summary.xlsx.
  3. Now switch view to the "summary" tab. On the right you will see several graphs. Turn first to the plot of "penpermode" vs "modesum". Notice that for high penalties no modes are found at all. Then at around penpermode = 0.6 the first 4 modes overcome the penalty. At around penpermode = 0.3 the next 3 modes also appear, showing how the model identified by the GA changes with penalty per mode. There are then further plateaus at very low penpermode values.
  4. Next turn to the plot of "penpermode" vs "Rwp". Notice in particular the sharp falls in Rwp when the first 4 modes turn on and when the next 3 turn on. Further additional modes lead to only very small decreases in Rwp.
  5. Lastly turn to the plot of "modesum" vs "Rwp". The plot resembles y = e^-x in shape with each additional mode offering diminishing improvement in Rwp. The graph is almost flat for modesum >= 7.
  6. You can confirm 7 modes is the right choice by inspecting the fit to the diffraction data. Try turning on 7 modes and the fit should be visually as good as when we used all 96. However, if only 4 modes are active there are considerable deficiencies in the TOF neutron fit. This is because the missing modes are O modes to which neutrons are particularly sensitive. The change in the X-ray fit is more difficult to spot.

 

 

[Modified 07-Aug-2017 by John S.O. Evans. Pages checked for Google Chrome.]