WATNCS (CCP4: Supported Program)
NAMEwatncs - Pick waters which follow NCS and sort out to NCS asymmetric unit
In protein crystallographic refinement, it is quite important to avoid false water molecules introducing noise in the electron density map. In the case that non-crystallographic symmetry exists in the crystal, many water molecules which bind to the protein should also follow NCS as their host proteins. The WATNCS program can pick those which follow operations from all the water candidates calculated from difference fourier maps. It can also sort them into each NCS asymmetric unit, in order to introduce NCS restraint in the crystallographic refinement. In the first few cycles of adding water, such a procedure is powerful to draw the refinement to the right direction.
WATNCS reads in coordinates file (in PDB format) and NCS operations. The groups of atoms satisfying all NCS operations will be written out by the program into a file with identical residue numbers but different chain names so that some refinement programs such as XPLOR or REFMAC can recoginize them as NCS equivalent atoms. The program can also write out atoms which only partially satisfy NCS operations (for example satisfy 2 of 3 NCSs).
UNIX example script found in $CEXAM/unix/runnable/
# rm fort.1* ln -s $SCR/junk1.pdb fort.11 ln -s $SCR/junk2.pdb fort.12 ln -s $SCR/junk3.pdb fort.13 ln -s $SCR/junk4.pdb fort.14 watncs << 'end' pdb wat1wchk.pdb out wat1wncs.pdb mol refm2w.pdb RELATE -0.9999970 -4.2974250E-04 -2.4399513E-03 -1.4226431E-03 -0.7066850 0.7075269 -2.0283312E-03 0.7075282 0.7066822 0.4821968 9.3696594E-02 -0.1603031 RELATE 0.9999976 2.0712232E-03 7.7215023E-04 2.0715299E-03 -0.9999978 -3.9639443E-04 7.7132753E-04 3.9799302E-04 -0.9999996 0.3441153 -1.6441345E-03 -0.5252590 RELATE -0.9999971 -2.3997077E-03 -2.4492259E-04 -1.5290348E-03 0.7091355 -0.7050705 1.8656463E-03 -0.7050681 -0.7091371 0.5303268 -0.2367973 -0.4432688 error 0.6 least 2 !CHAIN W U V X Y Z S T group W A group U B group V E group X F number 61 num1 127 atom O1 residue HOH occu 1.0 temp 30. 'end'
General: Each line which starts with "!" or "#" will be ignored. Available keywords are:
Input file name of the water molecules
Out file name of the water molecules
Input filename of protein molecule. This is used in the case you want to sort the chain name of water molecule according the chain name of protein molecule. If you have a 2-fold rotation NCS (or 222 fold symmetry), the protein file is also helpful to put water molecule into the "right assymetric unit". See GROUP. Note: the file must NOT contain any water molecule.
(a11 a12 a12) (tx) X2 = (a21 a22 a23)*X1 + (ty) (a31 a32 a33) (tz)The NCS rotation and translation (the maxtrix must start from another line). The translation should be in orthogonal coordinates (assuming the PDB file is). If you have coordinates of protein molecules, the matrix can be obtained from the program FIT or O. The command can be repeated. If you use the FIT program, there is an output file MATRIX which conatins this matrix. If you use the O program, you can use lsq_exp to obtain the matrix to some like lsq_rt_atob. Then you write to a file by command "write .lsq_rt_atob filename (3f10.6)". You can get the matrix in the file.
Allowed error range of NCS related water molecules. I recommend 2-3 times of RMS value of protein superimpostion or 1/3 or data resoution.
Miniumium number of NCS operations which selected waters much follow. In the example file, there is a NCS of 222 symmetry. That means there are 3 NCS operations. If 4 water molecule which follow this 222 symmetry, the program will think all the 3 NCS operations are satisfied, so it will write out these 4 atoms with an identical residue number but chain names (W,U,V and X in this case). If 1 of the atom is missing, the program will think 2 operations are satisfied. In the example file least=2, the program will still write them out, but with an other chain ID (Y) and not identical residue numbers. If 2 of the 4 atoms are missing, the program only think 1 NCS operation is satisfied and it is smaller than the least requriment 2 here. So they will not be writen to the output file. If you need them output two, you have to change the command into "LEAST 1". Default same as NCS operation number.
Chain name for output waters. If you have 3 NCS operations, waters with the first 4 chain name should be able to apply NCS restraint. In fact, it is possible to apply NCS restraint to all the water molecules output from this program by looking at the log file and write command in the refinement program carefully although I believe most people only have patient to use the restraint recommended by the program.
In the case you have 2-fold symmetry, the program might put water molecule in a "wrong" NCS assymetric unit (giving a wrong chain name). It is not a problem at all for NCS restraint. However if you want to sort the water molecule in to right protein, you have to input the protein coordinates and tell the program which water chain name corresponds to which protein chain name. In the example file, output waters with chain W will belong to protein A...... See also MOL.
For those water molecules for NCS restraint, the output residue number will start after this number.
For those water molecules not for NCS restraint, the output residue number will start after this number.
If this command is present, the water atoms will be output with the given atom name instead of the name from input file.
If this command is present, the water atoms will be output with the given residue name instead of the name from input file.
If this command is present, the water atoms will be output with the given occupancy value instead of the value from input file.
If this command is present, the water atoms will be output with the given B factor instead of the value from input file.
Automatically adding waters to models with NCS
There are always real water molecules which do not follow NCSs because of packing or other reasons. One have to run the above precedure without WATNCS at least once and check these water molecules interactively. However this should be in the last few cycles.
The automatic procedure can not prevent adding waters to sites where protein atoms should occupy (when protein atoms are mis-placed to some other sites). However, statistically most water molecules are correct and the procedure can significantly improve the map, the problem can be show up automatically by listing the atoms with negative values in Fo-Fc map using the DIFLIST program.
It can not be prevented that some compounds such as citrate is identified as waters. In this case, I think users can find out the problem by only checking the protein atoms but not checking all the waters.
The advantage of the above automatic proccedure is to make sure all the water molecules in the first few cycles are real. The map is improved by real molecules and NCS restraint. Later water adding would be based on better maps.