Exhaustive Symmetry Searches

Files needed: search_start_template.inp;wo3_pm3m_parent.cif; wo3_850c_simulated.xy; SGCOM5_april2016.CPP; Results_summary_tutorial.xlsx; All_in_one_Exhustive_Searcher_tutorial.py.

Bonus file: d8_03901_850c.xy.

Cheat files: Results_summary_tutorial_final.xlsx, WO3_850c_start.inp, result_summary.txt.

Learning Outcomes: The aim of this tutorial is to perform an exhaustive subgroup search to automatically determine the symmetry and structure of WO3 at 850 C. This is done by Rietveld refinement of all possible symmetry-allowed distortions of the Pm-3m parent structure down to a specified child. You will first anneal the unit cell parameters and distortion modes together before annealing the distortion modes only for each of 69 possible subgroups. You'll then use the fit of each model to make an intelligent choice of the "best" answer. It exemplifies the method in [1].

[1]. Lewis, J. W.; Payne, J. L.; Evans, I. R.; Stokes, H. T.; Campbell, B. J.; Evans, J. S.O., "An Exhaustive Symmetry Approach to Structure Determination: Phase Transitions in Bi2Sn2O7", Journal of the American Chemical Society 2016.

Exhaustive Search to find the Structure of WO3 at 850 C

1. Save the required files to a new, empty directory of your choice.

2. This tutorial requires python 2.7 and some additional packages for data analysis. The easiest way to install these is via the Enthought Canopy python 2.7 distribution available at https://store.enthought.com/downloads/#default, though other Python distributions can also be used. The xlsxwriter package (for writing to Microsoft Excel files) is also required and can be installed by typing "pip install xlsxwriter" into the Canopy command prompt (open Canopy and go to: Tools -> Canopy Command Prompt).

3. For this tutorial let's assume you want to explore all the possible subgroups between the parent cubic Pm-3m structure and a P4 child with a 2x2x2 subcell. We can then generate a list of possible subgroups using ISODISTORT. To do this:

i) Go to http://stokes.byu.edu/iso/isodistort.php.
ii) Next to "Import parent structure from a CIF structure file", browse to the wo3_pm3m_parent.cif on your computer and click "OK".
iii) Under Method 3, select space group symmetry "75 P4 C4-1", enter the basis {(1,1,0),(-1,1,0),(0,0,2)} row by row, and click "OK".
iv) Choose the first of two options: "75 P4, basis={(-1,1,0),(-1,-1,0),(0,0,2)}, origin=(0,0,0), s=4, i=48" and click "OK".
v) On the next page you can view the distortion modes available with the "view distortion" tool if you wish. To obtain the subgroup tree, scroll to the bottom of the page and check the "Generate TOPAS.STR output for subgroup tree" box. Then return to the top of the page, check the "Subgroup tree" radio button, and click "OK".
vi) After a few seconds you should find that 72 subgroups have been generated. Note that subgroups 1 (the parent), 2 and 31 do not have any active distortions. Click "Download zip file" at the bottom of the page to download 69 topas.str structure files of the remaining subgroups plus the parent. Extract them to your tutorial directory.

4. The next step is to set up the template topas .inp file that will be used as a starting point for each of the refinements to be performed and defines the process of annealing that will be used. It will contain a series of flags to indicate where automated changes are to be made for each subgroup. Thus, it is important to format these flags correctly.

i) Open the file called search_start_template.inp, and resave it as WO3_850c_start.inp. Sections of the file that require you to do some manual editing are marked with ->. Don't overwrite anything else in the template file. If you have up-to-date jEdit menus, these sections should be coloured purple; otherwise, update your menus! In this tutorial, appropriate entries have already been included for you, so you can quickly complete steps ii)-vi) by simply removing "->" wherever it appears in the .inp file.
ii) In the data information section, take each of the steps that follow. The items created in this section will remain unchanged (except for the refined Strain_G peak width) throughout your subsequent model testing.

- Enter "xdd wo3_850c_simulated.xy".
- Employ durham_d9 instrumental parameters (use the Durham Menus macro).
- Add "x_calculation_step = 0.01;" (the data was simulated in 0.01 steps).
- Add the following fixed background: bkg 208.6279 -55.466639 14.0695846 -1.59061875 9.24325286 0.730935477
- Enter keyword "str" followed by a refineable scale factor (e.g. "scale @ 0.001").
- Add the following fixed peakshape parameters:
TCHZ_Peak_Type(!pku, 0.02369,!pkv, -0.03068,!pkw, 0.00835,!pkz, 0.0000,!pky, 0.14902,!pkx, 0.0000)
- Add "Simple_Axial_Model(!axial, 8.43407)".

The peakshape should be broad initially to provide the peak overlap required to successfully determine the distorted lattice parameters. Therefore:

- Add a simple refineable peakshape parameter as "Strain_G(@, 1.1 val_on_continue = 1.1; min = 0.01; max = 2;)".

iii) Within the "for xdds" and "for strs" loops, immediately after "Lattice_Prm_Section", enter the space group and lattice parameters of the parent phase (taken from parent CIF) including suitable refinement flags based on the crystal system. For the present example, this information should appear as follows:
space_group Pm-3m
a lpa 3.76000
b lpa 3.76000
c lpa 3.76000
al 90.00000
be 90.00000
ga 90.00000

iv) Under "Sites", enter the atom sites of the parent phase. For the present example, which employs simple isotropic thermal parameters, enter the following:
site W x 0.00000 y 0.00000 z 0.00000 occ W 1 beq 1.5
site O x 0.50000 y 0.00000 z 0.00000 occ O 1 beq 3

Note that this parent information will be replaced automatically by appropriate subgroup information during the testing of each subgroup in the tree.

v) Under "control information" at the top of the template, you will see that we are set up for a two-step annealing process. You can add more than two steps, though two is sufficient in this case. Here you need to determine whether modes will be refined, and also set the number of iterations and the percentage-randomisation of the cell parameters to be applied after each convergence cycle. For the present example, an effective scheme is "iters 2000" with "!cell_perc 3" under "#ifdef run_1", and "iters 1000" with "!cell_perc 0" under "#ifdef run_2", with "#define modes_on" used (i.e. uncommented) in both steps. The first step should optimise the cell parameters and achieve close-to-optimal mode amplitudes. The second step allows the mode amplitudes to be further optimised without significantly changing the cell parameters.
vi) At this point, check that you have set up your template correctly by running it in Topas. As you have entered an unrefined parent structure you are not expecting to get a good fit to the data, but the file should still run without any issues. If your .inp doesn't run, check for "->" flags remaining in your .inp file. Look at WO3_850c_start.inp in the cheat folder if you are unsure. When you are satisfied you have set up the start .inp file correctly, save and close it.

5. You are now ready to run the exhaustive search:

i) Open All_in_one_Exhaustive_Searcher_tutorial.py in Canopy.
ii) On line 31: Change working_directory to match your tutorial directory.
iii) On line 32: Change topas_directory to match the directory where topas is installed.
iv) On line 33: Enter the isotropic thermal parameters for each atom type. For the present example, we can use: W: 1.5, O: 3.
v) Type "cd <Tutorial Directory>" into the Python prompt at the bottom of the screen (if the prompt does not appear, ensure that the View->Python option is checked).
vi) Press the green run button on the top toolbar. The search of all 72 possible structures will take ~10 mins on a typical desktop computer. It takes this long because you're running 216,000 Rietveld iterations. You can watch progress in results_summary.txt.
CAUTION: If you want to run the search more than once you need to move your result files (results_summary.txt, paste_me.xlsx and the best_inp and start_inp folders) to a different directory or delete them. If they are already present in your run folder when you begin the search, the python routine will return an error (indicating that it cannot create files that already exists).

6. Review the results

i) When the search is finished you'll find various .txt and .xlsx files summarising the results. You will also find best_inp and start_inp folders for each subgroup tested, which contain the best Rwp solution found for each subgroup and the pre-refinement starting .inp file. To better understand this output, copy topas0072_best.inp to your working directory and run it in Topas.
ii) Open the Results_summary_tutorial.xlsx spreadsheet, which is a template that summarises your results. Then open paste_me.xlsx. Paste the contents of cells C2 to K70 into the corresponding cells of Results_summary_tutorial.xlsx. You can then sort the data according to "Rwp_min" from smallest to largest using the drop down list in column C. Then fill in the "Rank" column D with the integers 1-69 in order -- just type 1,2 in the first two rows and drag down to complete the list. To the right of the spreadsheet you will see graphs and coloured "Maximal Subgroups".
iii) Examine the Rwp vs Subgroup plot (a) in the spreadsheet, which should have three clusters: very poor (Rwp > 50%), poor (Rwp 25-30%) and "goodish" (Rwp < 11%). We are interested in the goodish solutions with Rwp < 11%.
iv) Examine the Rwp vs Rank plots (b and c) in the spreadsheet. You should see 16 goodish solutions with Rwp < 11%. If you look closely (plot c) there are further "steps" amongst these solutions. Those on the lowest plateau with Rwp < 10.9% have the best fit.
v) Examine the Rwp vs number-of-parameters plots (d and e) in the spreadsheet. We want to choose a solution that is amongst the best performers in terms of Rwp but which also has a relatively small number of free structural parameters. Verify that the solution with 5 structural parameters on the lowest Rwp plateau is subgroup #58. This is our winner.
vi) If you're unsure which subgroups really belong on the same plateau, you can visually inspect the quality of each fit by looking in the "best_inp" folder. Notice that subgroup #58 provides as good a fit to the structural features of the data as any other. Any slightly lower Rwp of subgroups with more parameters results from subtle improvements to fitting peakshape. (This effect would be more notable when using real data).
vii) Finally, look at the coloured "maximal subgroups" columns of the spreadsheet. A maximal subgroup is coloured green when it ranks better than the subgroup in question (as you would expect), and red otherwise. If you see red maximal subgroups, you will likely find the subgroup indicated has very similar Rwp to the subgroup in question and is only ranked slightly worse. This is not a problem. However, if a maximal subgroup has a significantly worse Rwp than its parent, you know that the annealing process did not explore the parameter space effectively.

Bonus Tutorial Parts for the Super Keen

Part 1 - Refining your own experimental, peakshape and thermal parameters

In the tutorial above the peakshape and thermal parameters were given to you. For real data you would have to find them yourself. Determining appropriate values using the daughter structure (topas0072.str) is also a good check that true structure lies in the subgroup tree: if a good fit to the data cannot be achieved with the daughter structure then no other subgroup in the tree will work. To do this check and determine peakshape and thermal parameters for the search, follow the steps below.

1. Work through the Durham Symmetry Mode Refinement Menus selecting data file wo3_850c_simulated.xy, instrument Durham_d9 and read ISODISTORT str topas0072.str. Then add a refineable peakshape to help optimise your cell parameters: "Strain_G(@, 1.1 val_on_continue = 1.1; min = 0.01; max = 2;)".

2. Set the tetragonal cell parameters to anneal by changing them to read:

prm dummy_a 5.31744 val_on_continue = Rand(0.9, 1.1)*Constant(Get(a));
prm dummy_b 5.31744 val_on_continue = Rand(0.9, 1.1)*Constant(Get(b));
prm dummy_c 7.52000 val_on_continue = Rand(0.9, 1.1)*Constant(Get(c));
a = dummy_a;
b = dummy_b;
c = dummy_c;
al 90.00000
be 90.00000
ga 90.00000

Now uncomment the "continue_after_convergence" keyword from the top of the file. Temporarily fix (add "!" in front of parameter name) TCHz and axial peakshape parameters and refine. After a few cycles you should find Rwp ~ 31.8%. You should now remove the annealing from the cell parameters (comment out or delete the "val_on_continue …" part) and remove the Strain_G peakshape you added.

Now refine the peakshape parameters that were fixed (by removing "!" before each parameter name). Also refine the modes under "mode definitions" and set them to anneal by adding "val_on_continue = Rand(-0.01, 0.01);" (or use a macro) to read:

prm a1 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01, 0.01);
prm a2 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01, 0.01);
prm a3 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01, 0.01);

Also refine thermal parameters for each atom type by adding parameter names under "mode-dependent sites". It should read:

site W_1 x = W_1_x; y = W_1_y; z = W_1_z; occ W = W_1_occ; beq beq_W 0.0
site W_2 x = W_2_x; y = W_2_y; z = W_2_z; occ W = W_2_occ; beq beq_W 0.0

site O_1 x = O_1_x; y = O_1_y; z = O_1_z; occ O = O_1_occ; beq beq_O 0.0
site O_2 x = O_2_x; y = O_2_y; z = O_2_z; occ O = O_2_occ; beq beq_O 0.0

After running the refinement for a few cycles, you should find an Rwp of ~ 10.8%. Look at wo3_850c_simulated_riet_topas0072.inp if you are unsure. Verify the background, peakshape and thermal parameters you now have are similar to those given above.

Part 2 - Working with real experimental data

Bonus File Needed: d8_03901_850c.xy

Repeat the tutorial above using real (and relatively low quality) lab X-ray data collected on the Durham_d9 instrument.

Use the technique you learned in Part 1 of the bonus to identify appropriate fixed background, peakshape and thermal parameters for the real X-ray dataset. You will need to include a sample height correction in your refinement and a more appropriate x_calculation_step:

x_calculation_step = Yobs_dx_at(Xo);

Using data between 10 and 90 degrees 2-theta should suffice (and will save you time!):

Add "Start_X 10 Finish_X 90" to the xdd (data) section.

Try to make a similar analysis of your results as in step 5 of the main method.




[Modified 31-Jul-2017 by John S.O. Evans. Pages checked for Google Chrome.]