CCP4 Interface: Isomorphous Replacement Module

	CCP4i: Graphical User Interface
	Experimental Phasing Module

Merge Datasets

Scale and Analyse Datasets - SCALEIT and FHSCAL: Scale Datasets - Task Window Layout

Solution Files (HA files)

Prepare Data for HA Search - Revise, Ecalc, MTZ2various

Acorn - ab initio Phasing

SHELX - Heavy Atom Search

RANTAN - Direct Methods

Professs - NCS from HA

Oasis - SAD/SIR phasing

Generate Patterson Map: Excluding Large Intensity Differences; Generate Patterson Map - Task Window Layout

Real Space Patterson Search - RSPS

Run MLPHARE: Data Harvesting; Maps

This module contains the following tasks:: Merge Datasets (CAD); Scale and Analyse Datasets; Prepare Data for HA Search; Acorn - ab initio Phasing; SHELX - Heavy Atom Search; RANTAN - Direct Methods; Professs - NCS from HA; Oasis - SAD/SIR phasing; Generate Patterson Map; Real Space Patterson Search; Run MLPHARE

Specialist Help is available on:: ScaleChoose - choosing the right scaling program for your datasets

The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.

Merge Datasets (CAD)

See documentation in Reflection Data Utilities module.

Scale Datasets - SCALEIT and FHSCAL

For the scaling of derivative to native datasets, two CCP4 programs are available: SCALEIT and FHSCAL. The tutorial on isomorphous replacement by I. Tickle describes the strengths and weaknesses of those programs. Note that there is no unique solution to the problem of scaling together two different datasets. Various problems can arise from:

Scale Datasets with Anomalous Dispersion Data

The Scale Datasets task will run SCALEIT to scale together all the DPHn (the dispersive difference for the nth wavelength).

It will optionally do a cross-comparison of the anomalous data sets - this involves rerunning SCALEIT with the input:

LABIN FP = FPHn SIGFP = SIGFPHn FPH1 = FPHm SIGFPH1=SIGFPHm DPH1 = DPHm SIGDPH1= SIGDPHm

for all possible pairwise combinations of wavelengths n and m. From these runs, the cross-comparison Rfactor and normal probability for the acentric data are extracted.

It is also optional to perform analysis of dispersive differences by rerunning SCALEIT with the input:

LABIN FP = FPH(+)n SIGFP = SIGFPH(+)n FPH1 = FPH(-)n SIGFPH1= SIGFPH(-)n

From this analysis, the normal probablities for the acentric and centric data and the Rfactor are extracted. The input MTZ file must contain the FPH(+)n and FPH(-)n. If you do not have data in this form, you should run the mtzMADmod program which converts DPHn to the appropriate form. This program is not interfaced. A better solution is to use the latest version of the TRUNCATE program which retains the FPH(+)n and FPH(-)n on output.

The results of both these analyses are tabulated in a summary file called project_jobid_scaleit.summary.

Scale Datasets - Task Window Layout

In the Protocol folder of the Scale Datasets task, you can choose:

analysis only - use SCALEIT without refinement
scale refinement using SCALEIT - use SCALEIT refinement
scale refinement using FHSCAL (Kraut's method) - use FHSCAL refinement
FHSCAL scale refinement & SCALEIT analysis - in effect, a combination of options 1 and 3
apply input scale factors - use SCALEIT with externally determined scale factors - not usually used

Features to look out for in the Scale Datasets Task are:

Protocol option	Folder title	Importance	Comment
1	Analysis	Graphs of differences between datasets	Analysis against resolution always performed.
2	Refinement Parameters	Apply Wilson scaling	Final Wilson scaling (affects scale factor only) after least-squares scaling (scale and temperature factors). See also Wilson.
3	Fhscal Scaling Parameters		Perform Kraut scaling with FHSCAL. In extreme cases, namely if the high resolution limit of the native dataset is lower than that of (one of) the derivatives, certain reflections may not get output. See also Caveat in FHSCAL program documentation.
4	Analysis	Analysis of FHSCAL results	SCALEIT ANALYSE is performed after scaling using FHSCAL (see protocol options 1 and 3).
5	Input Scaling Factors		Externally determined scales applied and analysis performed. No refinement. See also SCALE.

See program documentation: SCALEIT, FHSCAL.

Solution Files

Heavy Atom (.ha) Files

Heavy atom (HA) files are short files which keep a record of the proposed heavy atom sites in a structure. They are analagous to the MR files of the Molecular Replacement module. The format of the file is similar to the ATOM input line for the MLPHARE heavy atom refinement program. There is one line per atom site and the line is free format beginning with the word ATOM:

ATOM atom_name x y z occupancy anomalous_occupancy BFAC B-factor

The interface to MLPHARE can use an HA file as input and HA files are output by:

PEAKMAX program: using the OUTPUT FRAC option
RANTAN task: the files are actually generated by the PEAKMAX program which searches the maps generated from RANTAN's output phases
Generate Patterson Map task: when the PEAKSEARCH option is on
Real Space Patterson Search (RSPS) task: when the site analysis option is used. The script for the task reads the RSPS program log file and extracts the sites information to HA files
MLPHARE task: after each refinement run the current refined coordinates are extracted from the log file to an HA file

HA files are generated with a default file name which is project_jobid_n.ha where n=1,2,3... . If you select an HA file from the menu under the View Files from Job button, it will be displayed in an HA file viewer which is similar to the MR file viewer and which has some simple functionality to edit the file. Picking a line in the file will put a # character at the beginning of a line and this line will then be ignored on input to MLPHARE. A second pick will remove the # character. There is a Change All button at the bottom of the viewer which will add or remove #'s from all ATOM lines. There is also an Edit Columns button which presents options to set the atom name, occupancy, anomalous occupancy and Bfactor for all the atoms in the file.

Prepare Data for HA Search - Revise, Ecalc, MTZ2various

You wil need to run this task for the following cases:

Input Data Phasing Method

MAD Rantan, Shelx, RSPS, Anomalous Difference Patterson Maps

SAD Rantan, Shelx

SIR Rantan, Shelx

In the Prepare Data for HA Search task window you should only need to identify the type of your data and which phasing program you intend to run, and the interface will make the necessary conversions described below.

MAD data is rescaled by the Revise program to give an estimate of the normalised anomalous scattering magnitude (given the column label FM by Rantan but sometimes referred to as FA in the literature). The input data can be in the form of F(+) and F(-) for each wavelength or be anomalous differences Dano for each wavelength. The output FM can then be used in similar fashion to a single anomalous difference (Dano) or isomorphous difference (Diso). The theory behind this is described in the Revise program documentation.

Data conversion

Direct methods programs such as Shelx and Rantan usually work with data in the form of normalised intensities rather than the structure factors which are normally used in macromolecular crystallography. So structure factor data must be converted to normalised structure amplitudes for use in direct methods programs. The Shelx program has an internal procedure to do this conversion but data intended for the Rantan program must go through the Ecalc program which calculates normalised structure amplitudes (usually given the column label E).

Rantan and all other CCP4 programs work with experimental data in MTZ file format but SHELX requires the data in an ascii format described in the Shelx documentation. The Prepare Data for HA Search task will use Mtz2various to convert an MTZ file to Shelx format.

See program documentation: Revise, MTZ2various, Ecalc.

Acorn - ab initio Phasing

Acorn can be used to phase when you have atomic resolution data, and a suitable starting point such as a known fragment (which in favourable cases can be a randomly-placed atom) or approximate phases. It can also be used to find heavy atom substructures at lower resolutions.

In the Protocol folder of the Acorn task, you can choose:

start from random atom (no prior knowledge) - use a large number of trials of a single atom, and refine the best phase sets obtained. This option can be used when you have no known fragments. It is most likely to work for metalloproteins, for small molecules, and when searching for sub-structures.
search and phase with starting coordinates - use a fragment of the structure as a search model in MR, followed by phase refinement.
phase from known starting coordinates - use a known fragment as a starting point for phase refinement.
phase from starting phases - phase refine only starting from externally supplied phases.

The program requires normalised structure factors. These can be supplied in the input MTZ file, or alternatively the task will run ECALC to generate them. The task may also require an input PDB file if a starting fragment is being supplied.

See program documentation: Acorn

SHELX - Heavy Atom Search

The SHELX program can be obtained from THE SHELX HOMEPAGE. The CCP4i interface is for Shelx-90. To ensure that CCP4i scripts can find the Shelx program, the full path name of the program needs to be entered in the Configure Interface window which is accessed from a button on the right hand side of the main window.

For more information on the SHELX program, see THE SHELX HOMEPAGE. This has references to various FAQs: The Shelx Homepage, section 7, and Thomas Schneider's FAQs. It also has a document on Macromolecular applications of SHELX, written for International Tables Vol. F.

RANTAN - Direct Methods

The RANTAN Direct Methods program can be applied to solving MAD data or isomorphous replacement data. The Interface will set the key input parameters appropriately for the type of data.

For isomorphous data, RANTAN works optimally with the input in the form of normalised amplitudes rather than structure factors so the Interface will usually run the ECALC program to convert SFs to normalised amplitudes. The Interface will alternatively allow input of either precalculated normalised amplitudes or normalised amplitudes and initial phases.

MAD data for RANTAN will be preprocessed by the REVISE program (see above) which generates estimates of FM which is the normalised anomalous scattering factor. The input to REVISE is the FP and FPH(+)n and FPH(-)n for dataset n. These data should have been scaled by the SCALEIT program. REVISE also needs to know the wavelength, f' and f'' for each wavelength.

See program documentation: RANTAN, ECALC, REVISE, SCALEIT.

Professs - NCS from HA

The Professs task will take heavy atom coordinates from a PDB file or a Heavy Atom (.ha) file and attempt to find NCS operators relating subsets of sites. Any operators found will be listed in the log file.

See program documentation: Professs

Oasis - SAD/SIR phasing

Oasis takes SAD or SIR data and uses direct methods to break the phase ambiguity. The program requires heavy atom sites, either from a Heavy Atom (.ha) file or entered manually. The task will optionally run DM to perform density modification after phasing.

See program documentation: Oasis

Generate Patterson Map

The Generate Patterson Map Task performs the following:

Run SCALEIT to find an optimal cutoff for excluding refections with suspiciously large differences
Run FFT PATTERSON in default sectioning mode to get first direction of map sections
Run MAPMASK to resection output map, to produce all necessary Harker sections
Run PEAKMAX to search maps for peaks and write these to the "Peak coord" file and to an HA file (see above)
Plot Harker sections with NPO

Optionally:

The user can give the coordinates of points to be plotted on the Patterson map

The user can give the coordinates of putative heavy atom sites and the VECTORS program is run to determine the predicted cross-vectors which are then plotted on the Patterson map

Excluding Large Intensity Differences

Erroneously large intensity differences can affect a Patterson map disproportionately because the parameter used, the intensity, is the square of the structure factor, and the square of a large number is a very large number. The effect seen in the Patterson map is ridges.

It is therefore usually a good idea to exclude the reflections with very high differences: FPH-FP from the difference Patterson and FPH+-FPH- from the anomalous difference Patterson. By default the Interface will run the SCALEIT program to analyse the data and use the value of 4.1*RMS(FPH-FP) which is a reasonable first estimate of a suitable cutoff. It may be worthwhile to try different cutoff values and look at the resultant Patterson map - the value used can be set at the top of the Exclude Reflections folder. Excluding 'good' reflections tends to degrade the map so it is not good to over-estimate the cutoff value. For very good data it may be unnecessary to exclude any data. The SCALEIT log file also has a table of Isomorphous and (if appropriate) Anomalous differences which show the number of reflections with given differences as a function of resolution shell.

Generate Patterson Map - Task Window Layout

Features to look out for in the Generate Patterson Map Task are:

Protocol option	Folder title	Importance	Comment
difference Patterson	Exclude Reflections	Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. `FPH` and `FP`)	see Excluding Large Intensity Differences
anomalous difference Patterson	Exclude Reflections	Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. `FPH+` and `FPH-`)	see Excluding Large Intensity Differences

See program documentation: SCALEIT, FFT, MAPMASK, PEAKMAX, NPO, VECTORS, HAVECS.

Real Space Patterson Search - RSPS

This task runs the RSPS program to find heavy atom sites from a Petterson. It may also be used to test previously determined heavy atom coordinates.

Run MLPHARE

MLPHARE can be used to refine either isomorphous or anomalous data. Check the 'Use anomalous difference data' box at the top of the MLPHARE interface if appropriate. The initial default interface only provides for describing one derivative or wavelength; click on the Add Another Derivative button under the 'MTZ in' section to open space for additional data.

The minimal input then required is some initial heavy atom definitions in the folder Describe Derivatives & Refinement. For each derivative enter a name, and the name of the HA file containing the data for that derivative. Alternatively, enter the atoms explicity by changing the Use data 'from file' menu option to Use data 'entered below' and then typing in the information. The Cut and Paste tool may be useful. For anomalous data you will need to enter the same HA file for each wavelength.

It is possible to edit the HA files 'on line' by clicking the View button on the file selection line. The HA file viewer has some simple editing tools but more complex changes may need to be done in an editor.

The output MTZ file contains columns PHIB_mlphare1, FOM_mlphare1 etc.. If you use this file as input to another MLPHARE run, set a new unique column name extension. Change the parameter 'Output label identifier' from mlphare1 to mlphare2 for instance. Each run of MLPHARE within the Interface also outputs one HA file for each derivative. These HA files can be used as input to the next MLPHARE run.

The SCALEIT documentation states: "MLPHARE has a built in weighting scheme which means that it doesn't do much harm to include less good data in phasing. After all the poor hkl should get low FOMs, and then DM can use the few reflections with reasonable phases to help in the phase extension procedure."

The MLPHARE program documentation has several helpful hints, e.g.: "NB: If an occupancy becomes near to 0.0 the coordinate shifts will possibly be meaningless", and a whole section of Notes on usage.

Suggested input numbers for Estimated Lack of Closure:

The program documentation suggests no input at all for the very first run.
The Interface has default 0.0 for all the numbers, even in the very first run.
Some people 'always' use a certain number (10% of F?!) in the very first run.

Data Harvesting

MLPHARE is one of the Data Harvesting programs. See Data Harvesting in CCP4i for implications for the Interface.

Maps

The MLPHARE interface has the option to output double difference maps which can be used to search for further heavy atoms. In this case the PEAKMAX program will also be run to list the peaks to a PDB file and to an HA file with the name project_jobid_label_peaks.ha where label is the MTZ column label of the derivative FPH. If you wish to do any other analysis on the map, it can be input to the 'Generate Patterson Map' task when the 'Run FFT ...' option at the top of the task window has been toggled off.

It is easiest to create maps by running the FFT task inside the Run Mlphare task. Do this by toggling on the option to 'Generate double difference maps files ...'.

In some cases it may be necessary to (re)create maps independently from the MLPHARE task. It is not possible to do this through the Create Task-Specific Maps task in the Map & Mask Utilities module. And only if you know exactly what you are doing should you attempt to do this through the Run FFT - Create Map task in the Map & Mask Utilities module.

See program documentation: MLPHARE, PEAKMAX, FFT.

See also MIRTutorial(Bath) (the HTML equivalent of $CDOC/Iso_repl_itickle_tut.bath.ps),
Isomorphous Replacement (Birkbeck),
LLNL - Bernhard Rupp's Crystallographic Web Applets (containing an applet which calculates expected anomalous dispersion ratios),
Chooch (a program for calculating Anomalous Scattering Factors from X-ray fluorescence data).