GNU-Darwin Web

# ARP_WATERS (CCP4: Supported Program)

## NAME

ARP_WATERS (ARP/wARP v5.0) - Automated Refinement Procedure for refining protein structures.

## SYNOPSIS

arp_waters XYZIN foo_in.brk MAPIN1 foo_2fofc.map MAPIN2 foo_fofc.map XYZOUT foo_out.brk
[Keyworded input]

# Automated Refinement Procedure

## DESCRIPTION

This CCP4 distribution is not the full distribution of the ARP/wARP suite, and includes only the programs arp_waters (which is actually version 5.0 of the arp_warp program), prepform, prepshel and t_shift, and the script arp_waters_plots.sh (renamed from arp_warp_plots.sh).

The complete ARP/wARP package contains additional automated scripts and alpha versions of new programs (for automated building of protein structures in electron density maps; see "Automated protein model building combined with iterative structure refinement" Perrakis, A., Morris, R.J.H. and Lamzin, V.S., Nature Struct. Biol. 6 (1999) 458-463), and is freely available to academic users from the ARP/wARP homepage, http://www.arp-warp.org. Industrial users are asked to contact the authors for a license agreement.

The version of ARP distributed by CCP4 also contains minor changes which enable the writing of "summary tags" into the program output - see the libhtml documentation for details of these tags (and how to suppress them!). Please note that these changes do not in any way affect the running of the program, and are purely cosmetic.

In addition this version of ARP is substantially older than the current version distributed by EMBL, and is retained only for the purposes of adding waters (hence the change of name). Details of the current ARP/wARP suite (including how to get it) can be found at the ARP/wARP homepage, http://www.arp-warp.org/.

## Introduction

The Automated Refinement Procedure, ARP_WATERS , is a program package for protein structure refinement. It combines in an iterative manner the reciprocal space structure factor refinement with updating of the model in real space. The latter attempts to mimic and automise a typically time extensive model rebuilding session at the graphics. The real space update is based on identifying and removing poorly defined atoms and the addition of potential new sites. This utilises some general shape properties of the electron density syntheses as well as stereo-chemical criteria.

The ARP)WATERS (actually ARP/wARP version 5.0) can be used in the following ways:

1.
Refinement of MR solutions
2.
Improvement of MAD and M(S)IR(AS) phases
3.
Averaging of multiple refinements
4.
Automatic tracing of the density map and model building (not available in CCP4 version)
5.
Building of the solvent structure
6.
Ab initio structure determination for metalloproteins at very high resolution

For a more detailed description of the ARP see the references given below.

The ARP/wARP procedure requires the use of reciprocal space refinement, density map calculation and the ARP/wARP software itself. The least-squares minimisation can be done with the CCP4 programs PROTIN / REFMAC with an optional additional scaling (e.g. using RSTATS). Use of other programs for least-squares minimisation, e.g. SHELXL, requires additional conversion to the CCP4 format which is provided within the ARP_WATERS package. Density map calculations are carried out with the CCP4 programs FFT and MAPMASK.

## Author information

Users are requested to report any bugs or suggested changes to the authors.

 Victor S. Lamzin EMBL Hamburg Outstation, c/o DESY, Notkestrasse 85, 22603 Hamburg, Germany Tel. +49-40-89902-121, Fax +49-40-89902-149, E-mail victor@embl-hamburg.de Anastassis Perrakis EMBL Grenoble Outstation, c/o ILL, Avenue des Martyrs, B.P. 156, 38042 Grenoble CEDEX 9, France Tel. +33-476-207632, Fax +33-476-207199, E-mail perrakis@embl-grenoble.fr

## References

Any application of ARP_WATERS should actually refer to ARP/wARP version5.0, and should cite a relevant publication (see the reference):

• ARP93 The original paper describing ARP
• ARP97 Elaborated analysis of the power and limitations of ARP
• wARP97 The original paper describing wARP
• ApARP96 An application of ARP to crystal structure refinement
• wARP98 Elaborated analysis of wARP and its application

1
V. S. Lamzin and K. S. Wilson.
Automated refinement of protein models.
Acta Cryst., D49:129-149, 1993.

2
V. S. Lamzin and K. S. Wilson.
Automated refinement for protein crystallography.
Methods in Enzymology, 277:269-305, 1997.

3
A. Perrakis, T. K. Sixma, K.S. Wilson, and V. S. Lamzin.
wARP: improvement and extension of crystallographic phases by weighted averaging of multiple refined dummy atomic models.
Acta Cryst., D53:448-455, 1997.

4
D. Pignol, C. Gaboriaud, J. C. Fontecilla-Camps, V. S. Lamzin, and K. S. Wilson.
How to escape from model bias with a high resolution native data set - structure determination of the PcpA-S6 subunit III.
Acta Cryst., D52:345-355, 1996.

5
E. J. Asselt van, A. Perrakis, K. H. Kalk, and V. S. Lamzin.
Accelerated X-ray structure elucidation of a 36 kDa muramidase/transglycosylase using wARP.
Acta Cryst., D54:58-735, 1998.

## Acknowledgements

The authors are especially grateful to:

• Keith S. Wilson (York, UK) one of the originators of the software;

• Zbyszek Dauter (Brookhaven, USA) and Richard Morris (EMBL Hamburg, Germany) for significant contributions to the software development;

• Eleanor Dodson (York, UK), Jozef Sevcik (Bratislava, SLO), Phil Evans (Cambridge, UK), Susanna Butterworth (York, UK), Titia Sixma (NKI Amsterdam, The Netherlands) and Erik van Asselt (Univ. Groningen, The Netherlands) for valuable suggestions.

## Using ARP_WATERS

### Applications

The areas of application of ARP_WATERS (actually ARP/wARP Version 5.0) include:

1.
Refinement of MR solutions
If the initial model (a Molecular Replacement solution) needs to be substantially improved then unrestrained xyzB reciprocal space refinement may be carried out with ARP/wARP performing updating of the whole model. Resolution of the data should be 2.0 Å or higher. The output is a set of ARP atoms (the ARP model). The (3F_o-2F_c / 2mF_o-DF_c, \alpha_c) map should be calculated from the ARP model and analysed carefully (yes, it's graphics time). The initial or the ARP model is then rebuilt to fit this map. Very often, if the X-ray resolution is high enough and the initial model is not completely wrong, the ARP atoms are located at approximately the true protein atom positions even in the case of unrestrained refinement. So they can be quite happily used as guides for rebuilding.

Please note, that for difficult cases approaches such as described for application #4 might work better even when starting from a molecular replacement solution.

2.
Improvement of MIR(AS) phases
ARP/wARP can be used to build a protein-like model consisting of a set of non-connected atoms (free atoms model) into a MIR map. This model is then refined as described above for #1.

3.
Averaging of multiple refinements
ARP/wARP can be used to prepare models and command scripts for several independent refinement runs as described for #1 and #2. The results are then processed in such a way that each reflection is given a weighted average phase, alphawARP, and a figure of merit, FOMwARP. The results, especially for modest resolution, are better compared to a single ARP/wARP refinement. The (F_o, alphawARP, FOMwARP) map is then calculated and should be inspected. Resolution of the data should be 2.3 Å or higher.

4.
Automatic tracing of the density map and model building
This is not available as part of the CCP4 distribution of ARP/wARP. Please visit the ARP/wARP homepage at http://www.arp-warp.org to obtain the full distribution from the authors.

5.
Building of the solvent structure
If the initial model is more or less correct, i.e. an R factor of about 30 % or less, and essentially only the solvent needs to be improved, restrained (standard) reciprocal space refinement is carried out with ARP/wARP performing automatic adjustment of the solvent structure. Resolution of the data should be 2.5 Å or higher. The output is the protein model with the solvent molecules transformed with symmetry operations to lie close around the protein. The (3F_o-2F_c / 2mF_o-DF_c, alpha_c) and (F_o-F_c / 2mF_o-DF_c, alpha_c) maps should be inspected.

6.
Ab initio structure determination for metalloproteins
ARP/wARP was successfully applied to the small, 52 amino acid protein rubredoxin. This structure could be solved ab initio. The success was clearly due to the the presence of the FeS4 cluster in the protein. The positions as derived from the Patterson synthesis were used as a starting model. This initial model gave an R factor of 53% at 0.92 Å resolution. The resulting ARP model gave an R factor of 16% and map correlation to the final model map of 90%. Subsequently the successful solution was obtained with X-ray data truncated to 1.6 Å.

### Model and Data Requirements

Quality of initial model

As the ARP/wARP real space update of the model is carried out on the basis of electron density maps calculated with model phases, the starting model for the refinement should be reasonable. The higher the resolution of the native dataset the less reasonable the starting model can be: if you have 1 Å data for a metalloprotein, a reasonable model is the metal itself.

Quality of X-ray data

The data normally should be of high resolution. Unrestrained xyzB refinement with ARP/wARP at lower resolution can potentially lead to a poorer quality density map. The X-ray data should be complete, especially in the low resolution range (5 Å and lower). If the low resolution strong data are systematically incomplete (e.g. missing or overloaded reflections), the density map, even in the case of a good model, is usually discontinuous and is inconsistent with the model. Because ARP/wARP involves updating on the basis of density maps, such discontinuity can lead to incorrect interpretation of the density and as a result to slow convergence or even non-interpretable maps.

In general, the number of X-ray reflections should be at least 6 times higher than the number of atoms in the model.

### Limitations

As ARP/wARP runs in conjunction with programs of the CCP4 suite all limitations of the latter remain. ARP/wARP itself is limited to:

1.
The CCP4 conventions should be set up before running ARP/wARP

2.
Density maps and reflection MTZ files in the CCP4 format

3.
Maximum map section size is 400,000 points. The maximum number of map sections is 1,000. The maximum number of atoms in extended real space asymmetric unit is 250,000

4.
Only acentric space groups (typical for proteins) and P1 are supported

5.
ARP/wARP operates with coordinate files in the standard PDB format

### Automated Scripts

The full distribution of ARP/wARP contains a number of automated scripts which are designed to help avoid mistakes and generally improve the user-friendliness of the programs. These scripts are not provided with the CCP4 distribution of ARP/wARP (which is any case substantially older than the current release of ARP/wARP) and so if you want to use them you will need to obtain the full distribution from the ARP/wARP homepage at http://www.arp-warp.org/.

### Supplementary Use of ARP_WATERS

After restrained refinement is complete and before using the graphics it is worth knowing which parts of the model should be corrected.
ARP_WATERS can be used for this purpose.

arp_waters XYZIN input.BRK MAPIN1 3Fo-2Fc.MAP XYZOUT temp << eof
MODE UPDATE ALLATOMS
CELL number number number number number number
SYMMETRY number/string
RESOLUTION number number
REMOVE ATOMS 50 CUTSIGMA 1.0
END
eof

The output of this job will contain a list of the 50 worst (from ARP/wARP 's point of view) atoms which do not agree with the electron density. These atoms should be inspected first. The input MAPIN1 should be the (3F_o-2F_c / 2mF_o-D_Fc,alpha_c) map.

### Updating Old Command Files

If you have a working command file from a previous release just change the ARP part to look like this:

arp_waters XYZIN input_coordinates MAPIN1 3Fo-2Fc_map_file \
MAPIN2 Fo-Fc_map_file_name XYZOUT output_coordinates << eof
MODE UPDATE ALLATOMS/WATERS
[CELL cell parameters]
[REFINE waters/allatoms]
SYMM spacegroup
RESOLUTION resmin resmax
FIND ATOMS number CHAIN string CUTSIGMA number/AUTO
REMOVE ATOMS number CUTSIGMA number [MERGE number] [KEEP ZEROOCC]
END
eof

## Keyworded input to ARP_WATERS

The ARP_WATERS input is keyworded. For example to give the cell parameters to the program we use the keyword CELL followed by the actual numbers, for instance CELL 40.86 52.34 87.69 90 90 90

An input card may also be followed by a number of subkeywords (this should become clear on further reading). The first keyword in a file MUST BE MODE and the last one MUST BE END. Other keywords may appear in any desired order. The order of the subkeywords has no restrictions.

Different ARP/wARP modes, require different input filles and different keywords. Examples are given below. The slash symbol (/) separates alternative subkeywords. Only the first four characters of each keyword or subkeyword (except END) are needed to actually identify it.

The available keywords are:

MODE, CELL, SYMMETRY, RESOLUTION, FIND, REMOVE, REFINE, MIRBUILD, SHAKEMODEL, LABIN, LABOUT, END

## On-line help

The ARP/wARP input pre-processor gives warnings or error messages if something is wrong. These should be carefully checked. It is also advisable to check ARP/wARP input prior to submitting a long refinement job.

Here are a few examples of how the on-line commands can be used. To start just type 'arp_waters' and then the keyword you are interested in.

arp_waters
END

arp_waters
MODE
Keyword MODE must be followed by 1 field(s)

Expected format:

MODE update waters/allatoms
MODE mirbuild
MODE shakemodel light/allatoms
MODE reflaver

arp_waters
MODE UPDATE WATERS
Optional keywords:
CELL cell parameters
REFINE waters/allatoms

Required keywords:
SYMM spacegroup
RESOLUTION resmin resmax
FIND ATOMS number CHAIN string CUTSIGMA number/AUTO
and/or REMOVE ATOMS number CUTSIGMA number [MERGE number] [KEEP ZEROOCC]
END (must be the last keyword)

arp_waters
MODE UPDATE WATERS
CELL
An error message:

This Data Card in not understood
Keyword CELL must be followed by 6 field(s)

Expected format:

CELL a b c alpha beta gamma

arp_waters
MODE UPDATE WATERS
CELL 30 45 37 90 90 90 A
This Data Card in not understood
CELL 30 45 37 90 90 90 A
Cannot accept field shown by arrows:
CELL 30 45 37 90 90 90 ==>A<==

arp_waters
MODE UPDATE WATERS
CELL 30 45 37 90 90 90
SYMM 4
RESOLUTION 20 1.5
FIND ATOMS 10 CHAIN W CUTSIGMA 3.0
REMOVE ATOMS 10 CUTSIGMA 1.0
END
Asymmetric unit limits 1/1   1/2   1/1

Comments: Cell parameters 30.000 45.000 47.000 90.000 90.000 90.000
Comments: Remove 10 old atoms if below 1.0 sigma in MAPIN1
Comments: Analyse waters only for removal
- WARNING - This is not a standard use of ARP
- use of MERGE data card is advisable

Comments: Look for 10 new atoms in MAPIN2
Above threshold of 3.0 sigma
- WARNING - This is not a standard use of ARP
- use of CUTSIGMA AUTO option is recommended
- assuming that MAPIN2 is Fo-Fc map

Comments: New atoms will not be put closer than 2.30 to existing atoms
Comments: New atoms will be selected if there is N or O exists within 3.30
Comments: New atoms will not be put closer than 2.30 to each other
Comments: New atoms will have B-factors assigned on the basis of MAPIN2
- density hight as expected for resolution range 1.50 20.00
- MAPIN2 is assumed to be Fo-Fc map in absolute scale
Comments: New atoms will have chain name W

- No real space refinement will be made
- WARNING - This is not a standard use of ARP
- real space refinement of waters is advisable

So ARP/wARP actually accepts the command file input and the program only gives comments and warnings (if everything else is formally correct). It will also make additional checks during the run.

## Monitoring and Troubleshooting

### Input Processing

ARP/wARP checks identity in the input cell parameters and those from the coordinate and map file headers. ARP/wARP does not check whether the cell parameters are meaningful at all, i.e. it will accept CELL 67.1 82.2 79.9 102.2 98.9 100.3 together with SYMM P212121.

ARP/wARP checks whether the orthogonalisation matrix derived from CELL is consistent with the matrix written at the top of the coordinate file.

ARP/wARP will refuse to accept a negative value of the number of atoms to update but does not check whether these numbers are not too high, i.e. are consistent with the formula given above.

ARP/wARP does not check whether the input MAPIN1 is indeed a (3F_o-2F_c / 2mF_o-DF_c, alpha_c) map or if MAPIN2 is really a (F_o-F_c / mF_o-DF_c, alpha_c) map.

ARP/wARP does not check the input coordinate file in terms of proper connectivity, residue and atom names, etc.

### Output

ARP/wARP outputs several useful quantities. These are: the number of atoms merged, the number of atoms removed, the sphericity functions indicating whether atoms are well shaped - a value of about 0.05 to 0.10 (the lower the better) is reasonable, the result of improvement of the sphericity function if sphere-based real space refinement is used, the statistically significant threshold in difference density (if FIND cutsigma auto is provided) for addition of new atoms, the number of atoms added.

The auto option provides an attempt to be objective in adding atoms. The actual number of atoms to remove depends both on REMOVE cutsigma value and atoms number). If the user during reshuffling the structure asked for not enough removal, the result would be that not enough new atoms are found. If the requested number for removal is too high (but assumed to satisfy the formula given above) - more new atoms will be found. A situation where each cycle ARP/wARP removes less than about 2-3 atoms (for typical structure of 1,000 to 3,000 atoms) and finds the same number of new ones and the R factor does not change indicates that convergence has been achieved. There is no reason to run millions of cycles. Usually refinement essentially converges after 10 to 20 cycles. However if the density is still getting better the number of cycles can be increased to 50 or even 100.

### Viewing ARP_WATERS Log Files

It is important to monitor the ARP/wARP output. In general look at log files. All ARP log files can be formatted for viewing all kinds of interesting graphs with CCP4 program xloggraph by running 'arp_waters_plots.sh log_file_name'.

### Checking Convergence

Several parameters can be used as convergence criteria. The first criterion is map quality. A map with coefficients (3F_o-2F_c/2mF_o-DF_c, alpha_c) is calculated from the last ARP model. The crystallographic R factor is a reasonable quantity to monitor.

What to do if the R factor stays at the values around 30%:
(Check with something like grep 'all_R' logs/1_arp_1.log) If for example after 5 or 10 cycles, R dropped to 28-34% and stayed there for the next 10 cycles without any tendency to drop further, you may be in trouble. Try to change from Fast to Slow protocol or opposite, try to introduce phase restraints, change advanced parameters, panic, cry, etc.! We are working on more sensible suggestions all the time, so as a last resort contact us! Your feedback is needed and appreciated!

### Crashing Scripts

Usually CCP4 defines environment MANPATH as complementary to the existing MANPATH. During execution of remote shells MANPATH does not exist, and this crashes remote scripts! Copy the ccp4.setup file to your local directory, and simply remove the line setenv MANPATH, and then set ccp4init to that file.

Please also check (and change if necessary) the line setenv CCP4_OPEN NEW to setenv CCP4_OPEN UNKNOWN.

## Examples

A typical set of ARP/wARP commands for applications #1, 2, 5 and 6 (unrestrained or restrained refinement for MR, MIR, ab initio solutions or building of solvent structure) could look something like this:

arp_waters XYZIN input_coordinates MAPIN1 3Fo-2Fc_map_file \
MAPIN2 Fo-Fc_map_file XYZOUT output_coordinates << eof
MODE update allatoms/waters
[CELL cell parameters]
[REFINE waters/allatoms]
SYMM spacegroup
RESOLUTION resmin resmax
FIND atoms number chain string cutsigma number/auto
REMOVE atoms number cutsigma number [merge number] [keep zeroocc]
END
eof

Keywords FIND and REMOVE are half optional, by that we mean that at least one of them must be given. Both MAPIN1 (3Fo-2Fc / 2mFo-DFc, alpha_c) and MAPIN2 (Fo-Fc / mFo-DFc, ac) maps must be provided.

Another typical set of ARP/wARP commands, this time for application #2 (filling the MIR(AS) map with a set of pseudo protein atoms for further unrestrained refinement or multiple refinements):

arp_waters MAPIN2 Fo-Fc_map_file XYZOUT1/2/3 output_coordinates << eof
MODE mirbuild
CELL cell parameters
SYMM spacegroup
RESOLUTION resmin resmax
MIRBUILD atoms number models number
END
eof

Input MAPIN2 is the available starting map. Several models for multiple refinements are output to XYZOUT1/XYZOUT2/XYZOUT3.

Yet another typical set of ARP/wARP commands, now for application #3 (obtaining different independent models for multiple refinement):

arp_waters XYZIN input_file XYZOUT output_file << eof
MODE shakemodel light/allatoms
[CELL cell parameters]
SYMM spacegroup
SHAKEMODEL [ bexcl n1 ] [ breset n1 n2 ] [ randomise x ] [ shift x y z ]
END
eof

And another typical set of ARP/wARP commands, again for application #3 (averaging of multiple refinements of different independent models):

arp_waters HKLIN mul_ref_Fs HKLOUT nice_output << eof
MODE reflaver
RESOLUTION resmin resmax
LABIN input labels for
FP  SIGFP  [FREE]  FCx  PHICx
FCAVER  PHAVER  FOMAVER
END
eof

### ARP_WATERS and SHELXL

SHELXL is part of the SHELX-97 program package and should be obtained directly from the author, George M. Sheldrick, Göttingen University SHELX homepage.

The most common use of ARP/wARP with SHELXL shelx97 is for restrained refinement with individual atomic anisotropic displacement parameters (as provided by SHELXL) combined with updating of the solvent structure by ARP/wARP . This application is limited to the fact that individual atomic anisotropic displacement parameters can be refined only if the resolution of the X-ray data is higher than 1.5 Å, ideally approaching atomic resolution (1.2 Å).

There are currently no automated scripts for this application. An old-style command shell script is given in the $CEXAM/unix/non-runnable directory (arp_waters_shelx.com). The script includes iterative runs of the following programs: 1. SHELXL (SHELX-97) for restrained anisotropic refinement Some recommendations for the shelx.ins file: CGLS 2. Use of more cycles within SHELXL lowers the ARP_WATERS contribution CELL, LATT/SYMM and SHEL should be consistent with cell, symm and resol in the script WPDB -1 LIST 3 ISOR and CONN should include O1 > last - as the number of waters changes with each cycle See the SHELX-97 Manual for further details. 2. PREPFORM (ARP/wARP Suite) for conversion of SHELXL files 3. F2MTZ (CCP4) for conversion to the CCP4 MTZ format Column label assignments should be edited if necessary 4. CAD (CCP4) for sorting the MTZ file Column label assignments should be edited if necessary 5. FFT (CCP4) for map calculation One map is calculated with coefficients 3Fo-2Fc, another with Fo-Fc Column label assignments should be edited if necessary 6. EXTEND (CCP4) for map extension 7. ARP_WATERS (ARP/wARP Suite) for solvent update The maximum number of atoms to add and to remove should not exceed the value of 0.08 X N/dmax3, where N is the current number of atoms in the model and dmax is the high resolution limit. 8. PREPSHEL (ARP/wARP Suite) for back conversion to SHELXL format When writing a shell script take care to define the following variables at the top of the file: name (root file name), last (starting file number), cycles (number of refinement cycles), count, title, resol (resolution limits), cell (cell parameters), grid (grid for map calculation), xyzlim (boundaries for real space asymmetric unit for ARP_WATERS), symm (space group number) and sfsg (space group for map calculation) ### Simple toxd example script found in$CEXAM/unix/runnable/

• arp_waters.exam (Example of finding waters.)

### Comprehensive example scripts found in \$CEXAM/unix/non-runnable/

• arp_waters_refmac.com
• arp_waters_sfall.com
• arp_waters_shelx.com