Exhaustive Symmetry Searches - Single TOPAS INP File in P1

Files needed: wo3_850c_descent_p1.inp ; wo3_850c_simulated.xy; wo3_pm3m_parent.cif; isodistort_tree.py.; child_base.str; subgroup_tree.txt; wo3_850c_descent_p1_v6.inp.

Bonus file: d8_03901_850c.xy.

This tutorial lets you search through all the possible structures in a group-subgroup tree from a high-symmetry parent to a low-symmetry child base subgroup. It exemplifies the method in [1]. We use the same example as in a previous tutorial (tutorial_exhaustive_symmetry.htm) of automatically determining the symmetry and structure of WO3 at 850 C by testing 72 possible subgroups of the parent cubic Pm-3m structure.

The approach is similar/complentary to the separate tutorial tutorial_exhaustive_symmetry.htm, which uses python routines to perform Rietveld refinements using a series of different .str files produced by ISODISTORT. Here everything is done in a single TOPAS .INP file and there is no need for external python scripts to control the refinements. A simple python script is provided to help you set up your own input files.

The basic idea is that all the subgroups in the tree between the high symmetry parent and the low symmetry child base structure (child_base.str) will be supergroups of child_base.str. In child_base.str a certain set of irreps will be active (i.e. the mode amplitudes are allowed to refine freely in that space group). In each of the different supergroups of child_base.str, only a subset of the irreps are active. Refining only the subset of the possible modes in child_base.str is equivalent to performing the refinement in a higher symmetry space group.

We can then use TOPAS #define flags and #ifdef .... #endif conditional statements to specify which modes to refine. We can use the TOPAS #list language to perform a series of refinements with different mode sets turned on and off. In this example we perform refinements twice in each space group. In a first step we just anneal the unit cell parameters to try and fit the experimentally observed peak splittings by allowing the cell to vary by up to +/- 3%. In a second step we also refine the allowed mode amplitudes. Results from this second step are saved.

The tutorial provides you with a pre-written INP file to test and inspect. There is also a python script that will produce INP file sections in the format you need.

The tutorial was written for topas v7. There a couple of features of the INP file that won't work in v6; a v6-specific .inp is provided with notes on how it differs [num_runs can't be an equation; num_cycles doesn't work so you control on iters not cycles].

[1]. Lewis, J. W.; Payne, J. L.; Evans, I. R.; Stokes, H. T.; Campbell, B. J.; Evans, J. S.O., "An Exhaustive Symmetry Approach to Structure Determination: Phase Transitions in Bi2Sn2O7", Journal of the American Chemical Society 2016.

Files Provided

wo3_pm3m_parent.cif - the parent structure.

wo3_850c_simulated.xy - data set to use.

wo3_850c_descent_p1.inp - prewritten INP file to perform all refinements for v7 or wo3_850c_descent_p1_v6.inp for v6

isodistort_tree.py - Python script to produce INP sections. useage: "python isodistort_tree.py" or "python isodistort_tree.py -inputfile subgroup_tree.txt -childfile child_base.str"

child_base.str - the lowest symmetry child structure used as the base of the subgroup tree.

Results_summary_tutorial_p1.xlsx - excel file set up for plotting/analysing the results.

INP file structure

Open wo3_850c_descent_p1.inp and read through each section. The file is best viewed in jEdit with folding turned on so you can look at each of the 11 sections individually. Here's an explanation of each section:

1. The overall control information section just contains R-factors etc.

2. Section 2 does a bit of housekeeping. On the first run it makes a backup of the INP file in case anything goes wrong. It deletes any old results txt file found in the directory. It then writes a header line for the results file.

3. Section 3 uses the #list command to perform refinements in each of the 72 sub groups. A #define macro is used to tell TOPAS which subgroup to work in. Information about the subgroup number in the tree and the space group and basis is also read in as metadata. There are two entries for each space group. Ths means that TOPAS can do two different refinements in each space group: a coarse cell parameter search followed by simulated annealing of mode amplitudes. The line "#define test_single_sub_group" can be flagged in and out to either test a specific single subgroup or to test all 72.

4. This section tells TOPAS which parameters to refine in the first and second refinement (you could add more sections for more complex examples). Here, the mode amplitudes of active modes are all set to zero in the first step using the macro "prm !##label 0" to put an exclamation mark in front of each mode parameter name. In the second step the mode amplitudes are turned on using "prm ##label val", where there is no exclamation mark. Different cell annealing is used in each step (coarse then fine).

5. This section contains the instructions which tell TOPAS what to refine in each subgroup using a series of "#ifdef subgroup_1_Pm-3....#endif" sections. In each subgroup a list of allowed modes are defined. The free unit cell parameters are also defined. Each refinement actually uses the unit cell of child_base.str. For simplicity, the constraints on cell parameters are only partial. For example, for orthorhombic space groups all cell angles are fixed at 90 degrees. However for monoclinic no attempt is made to fix any angles at 90. This is to avoid problems like gamma being the non-90 degree angle in some subgroups. It's possible to do more here via isodistort_tree.py, but for now it's up to the user to check cell parameters at the end of the analysis.

6. This section contains the mode amplitudes. When modes are allowed/active in a given subgroup they are turned on/off using the mode_onoff macro defined in section 4. They can either be refined or set to zero in different steps. When a mode is not allowed in a given sugroup it is always set to zero. You can see which values are refined/fixed to zero at any stage by whether they are coloured red or blue in jEdit. Note that blue values are always zero in the refined model.

7. and 8. These are standard ISODISTORT sections defining how mode amplitudes feed into final structural parameters. You will find them in child_base.str, which was produced by ISODISTORT.

9. A standard TOPAS section defining refinement instructions for the xdd data set(s). Note the extra peak broadening section used during cell annealing to help convergence.

10. Structural information in normal TOPAS format. Unit cell parameters are set equal to the values of lpa, lpb, ec defined in Section 5. At the start of each run fixedvalues of a_start, b_start, etc are read and used as reference points for cell simulated annealing. Structural sites are given in standard ISODISTORT format and can be copied from child_base.str.

11. This section defines what is written to the output file. Only results from the second step are saved.

How to create the file from scratch

To create the file from scratch we use a combination of ISODISTORT and the python routine isodistort_tree.py. If you want to create your own example I suggest using the INP file above as a template.

In the tutorial
we explore all the possible subgroups between the parent cubic Pm-3m structure
and a P4 base child with a 2x2x2 subcell. We can generate a tree of possible
subgroups using ISODISTORT and save two key files. To do this:

i) Go to http://stokes.byu.edu/iso/isodistort.php.

ii) Next to "Import parent structure from a CIF structure file",
browse to the wo3_pm3m_parent.cif on your computer and click "OK".

iii) Under Method 3, select space group symmetry "75 P4 C4-1", enter
the basis {(1,1,0),(-1,1,0),(0,0,2)} row by row, and click "OK".

iv) Choose the first of two options: "75 P4, basis={(-1,1,0),(-1,-1,0),(0,0,2)},
origin=(0,0,0), s=4, i=48" and click "OK".

v) On the next page you can view the distortion modes available with the "view
distortion" tool if you wish. To obtain the subgroup tree, scroll to
the bottom of the page and check the "Generate TOPAS.STR output for subgroup
tree" box. Then return to the top of the page, check the "Subgroup
tree" radio button, and click "OK".

vi) After a few seconds you should find that 72 subgroups have been generated.
Note that subgroups 1 (the parent), 2 and 31 do not have any active distortions.
Click "Download zip file" at the bottom of the page to download
69 topas.str structure files of the remaining subgroups plus the parent.

vi) Select the text on the webpage listing the different subgroups and copy/paste
it to a file called subgroup_tree.txt.

vi) Save the child base .str file (in this case topas0072.str) as child_base.str.

The python script isodistort_tree.py will read information from subgroup_tree.txt and child_base.str. It processes this to create a file called topas.out which contains Sections 3, 5, 6 and 10 needed for the symmetry descent .INP file. Run it with:

python isodisort_tree.py

or

python isodistort_tree.py -inputfile subgroup_tree.txt -childbasefile childbase.str

The routine also writes a list of subgrou number, basis and maximal subgroups as comma delmited text. This can be used to check that all subgroups of a given space group give the same or better Rwp values at convergence in excel.

Note that you can probably save time in the cell annealing stage if child_base.str contains distorted cell parameters, but with angles reset to 90 degrees. It might be worth recreating Section 5 of the INP file after updating cell parameters in child_base.str to their distorted values.

Running the File and Analysing Results

You should be able to run all 72 subgroups in ~5 minutes on a resonable PC. If you want to check a single subgroup then comment in the line which says "#define test_single_sub_group".

Review the results by:

i) When the search is finished you'll find results_summary.txt
with a list of Rwp values and cells for each refinement..

ii) Open the Results_summary_tutorial_p1.xlsx
spreadsheet, which is a template that summarises your results. Paste the Rwp
values from results_summary.txt into this spreadsheet cells B2 to L73.You
can then sort the data according to "rwp_min" from smallest to largest
using the drop down list in column C. Then fill in the "Rank" column
D with the integers 1-72 in order -- just type 1,2 in the first two rows and
drag down to complete the list. To the right of the spreadsheet you will see
graphs and coloured "Maximal Subgroups".

iii) Examine the Rwp vs Subgroup plot (a) in the spreadsheet, which should
have three clusters: very poor (Rwp > 50%), poor (Rwp 25-30%) and "goodish"
(Rwp < 11%). We are interested in the goodish solutions with Rwp < 11%.

iv) Examine the Rwp vs Rank plots (b and c) in the spreadsheet. You should
see 16 goodish solutions with Rwp < 11%. If you look closely (plot c) there
are further "steps" amongst these solutions. Those on the lowest
plateau with Rwp < 10.9% have the best fit.

v) Examine the Rwp vs number-of-parameters plots (d and e) in the spreadsheet.
We want to choose a solution that is amongst the best performers in terms
of Rwp but which also has a relatively small number of free structural parameters.
Verify that the solution with 5 structural parameters on the lowest Rwp plateau
is subgroup #58. This is our winner.

vi) If you're unsure which subgroups really belong on the same plateau, you
can visually inspect the quality of each fit by fitting a single subgroup
in the INP file. Notice that subgroup #58 provides as good a fit to the structural
features of the data as any other. Any slightly lower Rwp of subgroups with
more parameters results from subtle improvements to fitting peakshape. (This
effect would be more notable when using real data).

vii) Look at the coloured "maximal subgroups" columns of the spreadsheet.
A maximal subgroup is coloured green when it ranks better than the subgroup
in question (as you would expect), and red otherwise. If you see red maximal
subgroups, you will likely find the subgroup indicated has very similar Rwp
to the subgroup in question and is only ranked slightly worse. This is not
a problem. However, if a maximal subgroup has a significantly worse Rwp than
its parent, you know that the annealing process did not explore the parameter
space effectively.

viii) You should look through the refined cell parameters in each subgroup
to make sure each has converged sensibly..

Finally, you might want to think a little about cell parameters. You'll see that the best solution (#58) is tetragonal and that a and b have refined to essentially identical values. You should check that you get an equivalent fit with a = b. You'll also see that some of the cubic sugroups (e.g. #30) give a better fit to the data than you might expect as the cells are allowed to distort from ideal values. isodistort_tree.py contains some commented out lines which would let you control cell parameter constraints.

Bonus Tutorial Parts for the Super Keen

Part 1 - Refining your own experimental, peakshape and thermal parameters

In the tutorial above the peakshape and thermal parameters were given to you. For real data you would have to find them yourself. Determining appropriate values using the daughter structure (topas0072.str) is also a good check that true structure lies in the subgroup tree: if a good fit to the data cannot be achieved with the daughter structure then no other subgroup in the tree will work. To do this check and determine peakshape and thermal parameters for the search, follow the steps below.

1. Work through the Durham Symmetry Mode Refinement Menus selecting data file wo3_850c_simulated.xy, instrument Durham_d9 and read ISODISTORT str topas0072.str. Then add a refineable peakshape to help optimise your cell parameters: "Strain_G(@, 1.1 val_on_continue = 1.1; min = 0.01; max = 2;)".

2. Set the tetragonal cell parameters to anneal by changing them to read:

prm dummy_a 5.31744 val_on_continue = Rand(0.9, 1.1)*Constant(Get(a));

prm dummy_b 5.31744 val_on_continue = Rand(0.9, 1.1)*Constant(Get(b));

prm dummy_c 7.52000 val_on_continue = Rand(0.9, 1.1)*Constant(Get(c));

a = dummy_a;

b = dummy_b;

c = dummy_c;

al 90.00000

be 90.00000

ga 90.00000

Now uncomment the "continue_after_convergence" keyword from the top of the file. Temporarily fix (add "!" in front of parameter name) TCHz and axial peakshape parameters and refine. After a few cycles you should find Rwp ~ 31.8%. You should now remove the annealing from the cell parameters (comment out or delete the "val_on_continue …" part) and remove the Strain_G peakshape you added.

Now refine the peakshape parameters that were fixed (by removing "!" before each parameter name). Also refine the modes under "mode definitions" and set them to anneal by adding "val_on_continue = Rand(-0.01, 0.01);" (or use a macro) to read:

prm a1 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01,
0.01);

prm a2 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01, 0.01);

prm a3 0.00000 min -2.00 max 2.00 val_on_continue = Rand(-0.01, 0.01);

…

Also refine thermal parameters for each atom type by adding parameter names under "mode-dependent sites". It should read:

site W_1 x = W_1_x; y = W_1_y; z = W_1_z; occ W = W_1_occ;
beq beq_W 0.0

site W_2 x = W_2_x; y = W_2_y; z = W_2_z; occ W = W_2_occ; beq beq_W 0.0

…

site O_1 x = O_1_x; y = O_1_y; z = O_1_z; occ O = O_1_occ; beq beq_O 0.0

site O_2 x = O_2_x; y = O_2_y; z = O_2_z; occ O = O_2_occ; beq beq_O 0.0

…

After running the refinement for a few cycles, you should find an Rwp of ~ 10.8%. Look at wo3_850c_simulated_riet_topas0072.inp if you are unsure. Verify the background, peakshape and thermal parameters you now have are similar to those given above.

Part 2 - Working with
real experimental data

Bonus File Needed: d8_03901_850c.xy

Repeat the tutorial above using real (and relatively low quality) lab X-ray data collected on the Durham_d9 instrument.

Use the technique you learned in Part 1 of the bonus to identify appropriate fixed background, peakshape and thermal parameters for the real X-ray dataset. You will need to include a sample height correction in your refinement and a more appropriate x_calculation_step:

x_calculation_step = Yobs_dx_at(Xo);

Using data between 10 and 90 degrees 2-theta should suffice (and will save you time!):

Add "Start_X 10 Finish_X 90" to the xdd (data) section.

Try to make a similar analysis of your results as in the main method.

[Modified 10-Apr-2021 by John S.O. Evans. Pages checked for Google Chrome.]