CCP4 Tutorial: Session 3

See also the accompanying document giving background information.

3a) Scaling and analysing datasets

The Problem

You now have a file containing native data for GerE, and MAD data for a selenomethionine derivative. First, we scale each wavelength of the MAD data to the native dataset, so that all data is on the same scale. At the same time, we analyse the MAD data to estimate the strength of the dispersive and anomalous signals.

Exercise

3.1 Select the Experimental Phasing module, and open the Scale and Analyse Datasets task window.

3.2 On the first line, enter a suitable job title such as

Job title Scaling GerE datasets.

3.3 On the second line, select

Do scale refinement using scaleit.

On the next 2 lines, select

Use anomalous difference data

and

Do cross-comparison of data sets and analyse dispersive differences

using the radiobuttons.

3.4 Select the input MTZ file

MTZ in TEST gere_MAD.mtz

(If you do not have this file from the previous session, take the file from the DATA directory.)

Now select the columns from the MTZ file. The first line has the native F_nat and SIGF_nat. Then select columns for the 4 wavelengths, using the button Add Derivative Data to add more columns. (It might be easier here to load the file $DATA/session3a.def which already has these parameters set.) You should end up with:

FP	F_nat	Sigma FP	SIGF_nat
FPH1	FSEinfl	SigFPH1	SIGFSEinfl
DPH1	DSEinfl	SigDPH1	SIGDSEinfl
FPH+1	F(+)SEinfl	SigFPH+1	SIGF(+)SEinfl
FPH-1	F(-)SEinfl	SigFPH-1	SIGF(-)SEinfl
FPH2	FSElrm	SigFPH2	SIGFSElrm
DPH2	DSElrm	SigDPH2	SIGDSElrm
FPH+2	F(+)SElrm	SigFPH+2	SIGF(+)SElrm
FPH-2	F(-)SElrm	SigFPH-2	SIGF(-)SElrm
FPH3	FSEpeak	SigFPH3	SIGFSEpeak
DPH3	DSEpeak	SigDPH3	SIGDSEpeak
FPH+3	F(+)SEpeak	SigFPH+3	SIGF(+)SEpeak
FPH-3	F(-)SEpeak	SigFPH-3	SIGF(-)SEpeak
FPH4	FSEhrm	SigFPH4	SIGFSEhrm
DPH4	DSEhrm	SigDPH4	SIGDSEhrm
FPH+4	F(+)SEhrm	SigFPH+4	SIGF(+)SEhrm
FPH-4	F(-)SEhrm	SigFPH-4	SIGF(-)SEhrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_scaleit1.mtz

3.5 You should not need to change anything else. Select Run -> Run Now.

3.6 When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. This task outputs a large number of graphs for analysing the data, and we will just look at some of them.

3.7 We can gauge the strength of the dispersive differences by looking at the graphs Centric Normal probability v resolution and Acentric Normal probability v resolution ... for each pair of wavelengths, e.g. ... FP = FSElrm FPH = FSEinfl SIGFSEinfl DSEinfl SIGDSEinfl . For each graph, look at the line Gradient_on_reflection_prob.lt.0.9. Use the crosswires to estimate a rough value, e.g. for the low-remote against the inflection, the value is about 1.182 for centric data and 1.254 for acentric data.

The values can be summarised as (these values are contained in the file View Files from Job -> scaleit.summary):


 Table: Normal Probability for acentric data

 Normal Prob.   |  FSEpeak     FSElrm      FSEhrm

---------------------------------------------------
 FSEinfl        |   1.071       1.254       1.515

 Table: Normal Probability for Centric data

 Normal Prob.   |  FSEpeak     FSElrm      FSEhrm

----------------------------------------------------
 FSEinfl        |   0.975       1.182       1.470

This shows that the difference in f' values is smallest from the inflection to the peak, and largest from the inflection to the high-wavelength remote (the inflection point has the smallest f').

3.8 We can gauge the strength of the anomalous differences by looking at the graph Acentric Normal probability v resolution ... for F(+) and F(-) of each wavelength, e.g. ... FP = F(+)SEinfl FPH = F(-)SEinfl SIGF(-)SEinfl . For each graph, look at the line Gradient_on_reflection_prob.lt.0.9, and use the crosswires to estimate a rough value.

The values can be summarised as:


 F(+)SEhrm v F(-)SEhrm    1.01  
 F(+)SEinfl v F(-)SEinfl  1.17  
 F(+)SEpeak v F(-)SEpeak  1.43  
 F(+)SElrm v F(-)SElrm    1.34

This shows that the high-wavelength remote has hardly any anomalous signal, i.e. a low value of f'' at this wavelength. The peak wavelength has the largest f'', while the other 2 wavelengths have intermediate values.

3b) Preparing datasets for finding heavy atoms

The Problem

You are going to use a direct methods approach for locating the Se sites. In this section, you will prepare the MAD data for use in the direct methods program RANTAN. This task runs REVISE for generating the normalised anomalous scattering magnitude FM, and then the program ECALC for calculating the corresponding normalised structure factor E.

Exercise

3.20 Select the Experimental Phasing module, and open the Prepare Data for HA Search task window.

3.21 On the first line, enter a suitable job title such as

Job title Run revise for GerE data.

3.22 On the next line, select

Input MAD data as F+ F- and prepare data for ES (Rantan/Acorn)

3.23 Select the input MTZ file

MTZ in TEST gere_MAD_scaleit1.mtz

Now select the columns from the MTZ file:

FPH+1	F(+)SEinfl	SigFPH+1	SIGF(+)SEinfl
FPH-1	F(-)SEinfl	SigFPH-1	SIGF(-)SEinfl
FPH+2	F(+)SElrm	SigFPH+2	SIGF(+)SElrm
FPH-2	F(-)SElrm	SigFPH-2	SIGF(-)SElrm
FPH+3	F(+)SEpeak	SigFPH+3	SIGF(+)SEpeak
FPH-3	F(-)SEpeak	SigFPH-3	SIGF(-)SEpeak
FPH+4	F(+)SEhrm	SigFPH+4	SIGF(+)SEhrm
FPH-4	F(-)SEhrm	SigFPH-4	SIGF(-)SEhrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_prephadata1.mtz

3.24 Now fill in the section Anomalous Data as follows:

Data set 1 collected at wavelength	0.981	with estimated F'	-6.0	and F''	2.0
Data set 2 collected at wavelength	0.9	with estimated F'	-3.0	and F''	3.0
Data set 3 collected at wavelength	0.98	with estimated F'	-4.0	and F''	4.0
Data set 4 collected at wavelength	1.1	with estimated F'	-3.0	and F''	1.0

In fact, the wavelengths are only used as labels by the program. The important values are f' and f'', although the results are not very sensitive to the exact value. These values have been estimated from the known range of values of f' and f'' for Se, and the relative dispersive and anomalous differences estimated in the previous section.

3.25 You should not need to change anything else, so select Run -> Run Now.

3c) Find heavy atoms

The Problem

You have generated a column of E values which give a wavelength-independent measure of the anomalous scattering due to the Se sites. The Se sites can be found from the E values by Patterson methods, but here you will use a direct methods approach.

Exercise

3.40 Select the Experimental Phasing module, and open the Rantan - Direct Methods task window.

3.41 On the first line, enter a suitable job title such as

Job title Find Se sites for GerE.

3.42 On the second line, select

Set optimal Rantan parameters for isomorphous or anomalous difference data

and on the next line, select

generate map(s) and coordinate file listing peaks

3.43 Select the input MTZ file

MTZ in TEST gere_MAD_prephadata1.mtz

The rest of this section should be filled in automatically, and you do not need to change anything.

3.44 You should not need to change anything else, so select Run -> Run Now.

3.45 When the job has finished, view the output MTZ file by selecting in the main window View Files from Job -> gere_MAD_rantan1.mtz. The output file has 48 columns:


 * Column Labels :
 
 H K L F(+)SEinfl SIGF(+)SEinfl F(-)SEinfl SIGF(-)SEinfl mod_F(+)SEinfl
 mod_SIGF(+)SEinfl mod_F(-)SEinfl mod_SIGF(-)SEinfl F(+)SElrm SIGF(+)SElrm
 F(-)SElrm SIGF(-)SElrm mod_F(+)SElrm mod_SIGF(+)SElrm mod_F(-)SElrm
 mod_SIGF(-)SElrm F(+)SEpeak SIGF(+)SEpeak F(-)SEpeak SIGF(-)SEpeak
 mod_F(+)SEpeak mod_SIGF(+)SEpeak mod_F(-)SEpeak mod_SIGF(-)SEpeak F(+)SEhrm
 SIGF(+)SEhrm F(-)SEhrm SIGF(-)SEhrm mod_F(+)SEhrm mod_SIGF(+)SEhrm mod_F(-)SEhrm
 mod_SIGF(-)SEhrm FM SIGFM F E SIGE F2OR E2OR PHASE1 WT1 PHASE2 WT2 PHASE3 WT3

RANTAN generates and refines a large number of possible phase sets (default 500), but only outputs the best ones (default 3) to the output MTZ file. These phases and the corresponding weights are held in the last 6 columns.

3.46 From each of these phase sets, the task calculates a map and locates peaks, which may correspond to Se sites. These peaks are output in both orthogonal and fractional coordinates. Click on View Files from Job to reveal a list of output files. For each phase set, there will be a .pdb (orthogonal coordinates) and a .ha (fractional coordinates) file, for example TEST_5_1.pdb and TEST_5_1.ha for phase set 1. The default peak search produces approximately 15 peaks - we expect there to be 12 Se sites for this protein (2 each for 6 chains). (Note that RANTAN starts from random phases sets, so the results are not always the same.)

3d) Heavy atom refinement

The Problem

You now have 3 sets of possible Se sites. Heavy atom refinement and phasing is done using the program MLPHARE. The stages are:

Refine heavy atom ( = Se) parameters (XYZ coordinates, B factor, real occupancy, anomalous occupancy).
Remove heavy sites that don't refine well. This usually means that the occupancy drops to a small or negative value.
Look for new heavy atom sites using Fourier difference maps.
When refinement is complete, generate the final phases.

To do this successfully, you need to run MLPHARE several times. The Se parameters are held in a .ha file, and the aim is to improve these as much as possible. Do not worry about other output files, until the last cycle when you generate phases.

For the tutorial, we just do the 1st stage (exercise 3d) and the last stage (exercise 3e).

Exercise

3.60 Select the Experimental Phasing module, and open the Run Mlphare task window.

3.61 On the first line, enter a suitable job title such as

Job title Refining Se sites for GerE - set 1.

3.62 In the first section, select:

Use centric data only.

Leave everything else unselected.

3.63 Select the input MTZ file:

MTZ in TEST gere_MAD_scaleit1.mtz

Now select the columns from the MTZ file.

FP	FSEinfl	Sigma FP	SIGFSEinfl
FPH1	FSEhrm	SigFPH1	SIGFSEhrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_mlphare1.mtz

3.64 In the section Data Harvesting, leave as:

Do not create harvest file

3.65 In the section Key parameters, enter resolution limits (we exclude data which does not help phasing):

Resolution limit from 15.0 to 2.8 .

3.66 In the section Describe Derivatives & Refinement, enter a name for the derivative:

Phase with&Refine derivative infl to hrm .

On the next line, check you are refining (real) occupancy only:

Use isomorphous data to refine occupancy

Then select the file of heavy atom coordinates output by RANTAN. You can use your own file if you want, but it is recommended to use the prepared file in DATA:

HA in DATA rantan_set1.ha

3.67 Select Run -> Run Now.

3.68 Click on View Files from Job -> TEST_9_1.ha and look at the list of refined sites. The occupancy of sites 7 and 12 are now below 0:


ATOM7   Ano   0.244  0.156  0.961 -0.590 BFAC   20.000
ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL
ATOM12  Ano   0.216  0.274  0.077 -4.304 BFAC   20.000
ATREF X ALL Y ALL Z ALL OCC ALL AOCC ALL B ALL

and these site should be deleted from the list. The easiest way to delete this site is to click on these two lines in the viewer window, which turns these into comment lines. Then click Save&Exit.

3.69 Return to the Run Mlphare task window. In the section Describe Derivatives & Refinement, add in XYZ refinement, and change the occupancy refinement to alternate occupancy and B factor:

Use isomorphous data to refine XYZ and alternate Occ & B

3.70 Update the heavy atom file:

HA in TEST TEST_9_1.ha

3.71 Select Run -> Run Now. The interface will ask you whether you want to overwrite gere_MAD_mlphare1.mtz. This is OK, so click Delete File .

3.72 When the job has finished, you can check the refined Se sites as before. There are several ways the refinement of the Se sites can be optimised:

Inspect difference Fourier maps for extra sites.
Repeat with FSEinfl as native, and FSEpeak or FSElrm as derivative.
Repeat with different set of sites from RANTAN.

You do not have time to do this, so we now skip to the final stage....

3e) Phasing

The Problem

We assume you have now found all the Se sites, and have refined their positions, real occupancies and B factors. A file is provided in DATA with the correct Se coordinates. To get the best phases, we now include all wavelengths together, and use the anomalous signal as well.

Exercise

3.80 In the Run Mlphare task window, enter a suitable job title such as:

Job title Refinement against all data.

3.81 In the first section, select:

Use anomalous difference data

and

Apply calculated scale to output Sfs

but make sure:

Use centric data only.

is de-selected and leave everything else unselected.

3.82 Select the input MTZ file:

MTZ in TEST gere_MAD_scaleit1.mtz

Now select the columns from the MTZ file.

FP	FSEinfl	Sigma FP	SIGFSEinfl
FPH1	FSEinfl	SigFPH1	SIGFSEinfl
DPH1	DSEinfl	SigDPH1	SIGDSEinfl
FPH2	FSElrm	SigFPH2	SIGFSElrm
DPH2	DSElrm	SigDPH2	SIGDSElrm
FPH3	FSEpeak	SigFPH3	SIGFSEpeak
DPH3	DSEpeak	SigDPH3	SIGDSEpeak
FPH4	FSEhrm	SigFPH4	SIGFSEhrm
DPH4	DSEhrm	SigDPH4	SIGDSEhrm

Check that the output MTZ file is given as

Output MTZ TEST gere_MAD_mlphare2.mtz

3.83 In the section Key parameters, enter resolution limits:

Resolution limit from 15.0 to 2.8 .

3.84 In the section Describe Derivatives & Refinement, Derivative Number 1, enter a name for the derivative:

Phase with&Refine derivative infl to infl .

On the next line, refine (anomalous) occupancy only, against anomalous data:

Use anomalous data to refine occupancy only

Then select the file of correct sites that has been provided:

HA in DATA nat_sul_ref.ha

3.85 Repeat 3.84 for the other 3 wavelengths

3.86 Select Run -> Run Now.

3.87 When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. Graphs are given for each wavelength, for both the last refinement cycle and the final phasing cycle. Look in particular at:

Lack of closure analysis .... / Phasing power ...., Lack of closure analysis .... / Cullis Rfactor ....: For good data, the phasing power should be greater than 1, and the Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f' value (compared to that for FSEinfl).
Anomalous lack of closure analysis .... / Ano Cullis Rfactor .....: The anomalous Cullis Rfactor should be significantly less than one. The values for the different wavelengths ("derivatives") should correlate with the f'' value

In fact, the example data does not give very good statistics. However, the structure was solved by this method!

3f) Density Modification

The Problem

The phases output by MLPHARE can be used to generate an electron density map for the native data. However, the map is likely to be easier to interpret if density modification is performed first. (In fact, MLPHARE gives realistic Figures of Merit and therefore density modification usually works well.) Density modification (also known as Density Improvement) can be done using the program DM.

Exercise

3.100 Select the Density Improvement module, and open the Run DM task window.

3.101 On the first line, enter a suitable job title such as

Job title DM on MAD phases - first hand.

3.102 Select the input MTZ file:

MTZ in TEST gere_MAD_mlphare2.mtz

Now select the columns from the MTZ file.

FP	F_nat	SIGFP	SIGF_nat
PHIO	PHIB_mlphare1	Weight	FOM_mlphare1

3.103 Enter the solvent content as

Fraction solvent content 0.538 .

3.104 Everything else can be left as default, so Run -> Run Now.

3g) Testing the hand

The Problem

The procedure for locating the Se sites cannot distinguish between a particular set of sites and the same set of sites transformed through a point of inversion, i.e. it cannot distinguish the hand of the solution. Therefore, the previous phasing run should be repeated using the opposite hand.

Then we look at two things:

look at maps - one map should have a clearer solvent boundary than the other.
run DM - one hand should give marginally better statistics than the other.

But the main difference is whether or not you can build a model ....

Exercise

Re-run the previous 2 exercises, but use the following file of sites instead:

HA in DATA nat_sul_opp_ref.ha

which has the opposite hand.