VECREF (CCP4: Supported Program)
NAMEvecref - Vector-space refinement of heavy atom sites in isomorphous derivatives.
SYNOPSISvecref MAPIN foo_in.map ATOUT foo_out.dat
Vector-space refinement is an alternative to the standard reciprocal-space refinement of Busing & Levy (); instead of least-squares minimisation of the sum of weighted squared differences between observed and calculated structure factor amplitudes with respect to the atomic parameters, we minimise the same function of the observed and calculated heavy-atom difference Patterson function values.
Although the Patterson function is a complete representation in vector space of the set of structure factor amplitudes (provided the Patterson function is sampled at a sufficiently fine interval), the two types of refinement are not equivalent. This is because the minimisation in vector space is only done for sample points where the calculated Patterson density is significantly positive, in other words near the interatomic vector peaks, whereas each structure factor contains information from all points in real space (except in special zones). In all real crystal structures the atoms fill all the space (there are no holes), so there would be no advantage with vector-space refinement; indeed there would be problems due to considerable overlap of peaks in vector space.
For a difference structure the situation is reversed; most of the space is empty and the probability of overlaps is small; and since the structure factors represent mostly empty space, they are dominated by errors arising from taking a small difference between large quantities with small relative errors. As a result, reciprocal-space refinement of heavy-atom parameters is notoriously slow to converge and insensitive to misplaced sites and to errors in the starting values of the parameters, and refinement statistics usually give little indication of the correctness of the solution. (Typical conventional R-factors for reciprocal-space heavy-atom refinement are in the range 30% (rarely) to 70%, usually around 50%, compared with the random value for acentric data of 58.6%.)
The theoretical convergence radius for reciprocal-space refinement is dmin/4, i.e. 0.75Å at 3Å, but this is unlikely to be achieved in practice because of random errors; for vector-space refinement the theoretical convergence radius is the apparent atomic diameter, which is about dmin/sqrt(2), i.e. about 2.1Å at 3Å.
Vector-space refinement also has the considerable advantage over reciprocal-space refinement in that it is possible to perform the least-squares fit to the 3-dimensional data set using only isomorphous differences, so that heavy-atom derivatives for which no anomalous scattering data are available (or are of dubious reliability) can be refined using information from all data, centric and acentric. In this case it is important to have the derivative data already scaled accurately against the native data, as the derivative scale-factor cannot be refined in real space. The Kraut scaling technique () is recommended for this; see the documentation for program FHSCAL.
In vector space, there exists no procedure corresponding to the heavy-atom difference Fourier for finding minor sites. However other options which utilise the Patterson, such as superposition methods, are available, and of course one can still calculate structure factors from the refined atomic parameters and do a difference Fourier.
Free format using keywords. The following keywords may be used; only the leading 4 characters are significant and the order is immaterial:
ANISO, ATOM, BLIMITS, BREFINE, CYCLES, DAMP, END, GROUP, ORIGIN, RCUT, RESOLUTION, SCALE, SPACEGROUP, THRESHOLD, TITLE.
The keywords SPACEGROUP, RESOLUTION, CYCLES and ATOM are always required, the rest are optional, and assume default values if omitted.
Title (max 100 characters).
Space group name (e.g. P212121) or number (e.g. 19).
Approximate minimum and maximum d-spacings (high and low resolution cutoffs respectively; may be given either way round) of the data used to compute the Patterson map. d_max may be omitted, in which case no low resolution cutoff is assumed.
Radius cutoff for Patterson peaks. The default radius is defined by the point where the density either becomes zero or a minimum; this is usually satisfactory. If RCUT is positive the radius is taken as a RCUT times the RMS radius. Note that the RMS radius is typically 0.5-0.6 times the default radius, so there is no point setting RCUT above about 1.5. If RCUT is negative, the radius (in Angstrom) is taken as abs(RCUT).
Group symbol, any 2 characters used to represent the group scatterer, e.g. HI for HgI4-- (do not use a symbol which could be confused with an atomic symbol).
Atomic symbol, multiplicity and radius for the first type of atom in the group; i.e. if this is the central atom then w1 = 1 and r1 = 0.
Repeat for the second type of atom (optional).
The program will perform <ncyc1> cycles of refinement of the occupancy factors (usually 3 to 5, possibly up to 10 for difficult cases), followed by <ncyc2> cycles of refinement of the occupancies and coordinates (usually about 5), and finally <ncyc3> cycles of refinement of all variables (occupancies, coordinates and thermal parameters) (usually 10-20).
Specify to include Patterson origin. Usually exclude, except in the case of 1 atom in space group P1, when the origin is the only vector. See note 5.
Scale factors for the coordinates; if these are fractional then scale(i) = 1 (this is the default). If they are in grid units (i.e. taken directly from a map) then scale(i) = number of grid units along each cell edge. (These grid units need not be the same as those used in the Patterson read by the program.)
This is currently not operational.
Specify to refine individual isotropic thermal parameters (ncyc3 must also be > 0). The default is to refine an overall B factor (when ncyc3 > 0). See note 4.
Damping factor for occupancy and B-factor shifts when these are refined together. The default value is 0.25. The occupancies and B-factors are always highly correlated, so without damping the shifts tend to oscillate.
Only atoms with refined occupancies above this threshold times the estimated standard deviation are written to the ATOUT file. If threshold is given as negative, the absolute value of the occupancy is tested, so that negative occupancies may also be written out. Default is 2 sigma.
Only atoms with B factors within the specified limits are accepted on input, and are written to the ATOUT file. Defaults are 0 200.
Terminate input and start the calculation.
Logical names used:
The dynamic array dimension will be automatically increased if it is not large enough. However, the program will run faster if it is set large enough initially (e.g. setenv VECREF_MAXPTS 1000000).
The refined atomic parameters appear in a table under the following headings:
Atom Parameter Init Old Shift Change New Esd
Atoms are not allowed to shift from their initial position by more than twice the expected convergence radius (approximately dmin*sqrt(2)); atoms for which the calculated shift would take them over this limit have their occupancy set to zero; the occupancy is then allowed to refine upwards only if the coordinate shifts move the atom back towards its initial position. Atoms with occupancy exactly zero are ignored on input; if this effect is not desired, alter the zero occupancy to some small number (e.g. 0.001).
I have not yet had sufficient experience with the program to know what constitute good R-factors and correlation coefficients, or which statistic to place the greatest reliance on, though the R-factor seems to be more discriminating. The best result I have had so far is R = 17.5%, C = 0.976, (excluding the Patterson origin, and not refining B-factors) for the test data distributed with the program. Note this is real data ! (Acknowledgements to Dr. Simon Phillips, University of Leeds.)
The list of vectors, which appears after the occupancy refinement cycles and again at the end of the run, has the Patterson map coordinates in the same units as the atomic coordinates if these were supplied as map grid coordinates, or in the same units as the map supplied if the atomic coordinates were supplied as fractions of a unit cell. The columns labelled Pobs, Pcalc, and <Pobs>, <Pcalc> are approximate peak heights and mean peak values respectively. They are intended only as a guide to the fit; in particular the Pcalc values do not include a contribution from any overlapping peaks. (These are not the data used in the least squares, where the fit is done on a grid point basis, not on a peak basis.)
One or more of the following messages occur when errors are discovered in the input control data; the program continues to process the data but stops when all data has been read:
*** ERROR: Centring translation not integer. *** ERROR: Identity position not found. *** ERROR: Invalid number of atoms. *** ERROR: Atom symbol not in scat. fact. list. *** ERROR IN GETINP: Error(s) in input data.
The following conditions indicate that an array is not large enough for the problem and should be cured either by correcting the input data (in most cases by increasing the sampling interval of the Patterson), or by increasing the value assigned to the symbol specified in PARAMETER statements in the source code, taking care to modify all PARAMETER statements containing the symbol, and re-compiling and linking.
*** ERROR: Array bound check (MSECT). *** ERROR: Array bound check (MRHO). *** ERROR IN ADDAST: Array bound check (MCF). *** ERROR IN PKLIST: Array bound check (MPL). *** ERROR IN GENPTS: Array bound check (MPT).
The following conditions are likely to occur if either the symmetry specified is wrong, or the Patterson map is less than an asymmetric unit:
*** ERROR: Point not found. *** ERROR IN GENPTS: No points.
The following errors occur when the standard map handling routine detects an error; this is likely to indicate something seriously wrong with the map file, like data corruption:
*** ERROR IN PKLIST: MGULP error. *** ERROR IN REFCYC: MGULP error.
One important point often not clearly understood about this method of refinement is that the program does handle overlapping vectors correctly, provided first that the overlap is between known atoms and second that not all vectors arising from a pair of atoms overlap. This is because the calculated Patterson density is summed over all contributing atom pairs before being compared with the observed value. Vectors which overlap with those due to as yet unknown atoms will positively bias the occupancy, and will also affect the refined coordinates, but this is also true for the reciprocal-space method.
Simple unix example script found in $CEXAM/unix/runnable/
(A vms version found in $CEXAM/vms/vecref.com)
AUTHOROriginator: Ian Tickle, Birkbeck College, London