Google

FINDNCS (CCP4: Supported Program)

NAME

findncs - detect NCS operations automatically from heavy atom sites

SYNOPSIS

findncs
[Keyworded input]

DESCRIPTION

 ================================================
              F  I  N  D  N  C  S
 Looking for NCS relations from heavy atoms sites
            Version 1.1      Sep-30-1997         
 ================================================
Author:
Guoguang Lu
Department of Molecular Biophysics
Lund University
Box 124, 221 00, Lund, Sweden
E-mail:
Guoguang.Lu@mbfys.lu.se
Contents:
Introduction
Example command File
Key Word Commands
Conventions and Example output
Frequently Asked Questions
Acknowledgement

REFERENCES

  1. Lu,G. (1999) FINDNCS: A program to detect non-crystallographic symmetries in protein crystals from heavy atoms sites J. Appl. Cryst. 32 365

Introduction

FINDNCS is a program which automatically find out Non-Crystallographic Symmetry (NCS) operations from heavy atom sites, in order to facilitate applying averaging technique in the MIR/MAD procedure. The program outputs NCS operations (a rotation matrix and translation vector), RMS, polar angles and screw distance, matching sites and other useful information for users. Optionally, the program can also generate some files so that NCS operations can be displayed by the O program automatically.

The program requires at least 6 heavy atom sites for each NCS one operations unit i.e. at least 3 sites for each NCS assymetric unit (which can be a protein monomer, dimer, trimer or even higher oligomer). Once the coordinates of the sites were input, they were extended by crystallographic symmetry and lattic repetation. Then the program systematically searches whether a group of sites can match an other group of sites by a NCS operations. There are a number of criteria to choose one NCS operation among crystallographic symmetric related ones:

  1. Number of matching sites are maximium
  2. Screw distance of the NCS operation is mininium
  3. Center of the two group sites are closest
  4. Radius of the group of atoms are mininium
Once all the independent NCS have been found, they are ranked by these criteria.

In some cases, for example, when the number of sites are not too many (less than 20), the space group is simple (such as triclinic, monoclinic, R3, P3), and/or the NCS relations are regular (like 222 tetramer 2-fold dimer and so on) users can run the program fully automatic and accept the suggestions from the output. However, if users have high symmetry, many HA sites or many protein molculars with unregular NCS relations, they have to follow the instruction in this manual to find the NCS step by step. Alternatively, users can also define the seaching range (which includes the entire oligomer of the protein) themselves using the knowledge from solvent flattening map. Otherwise, it might take long CPU time and or the results are too complicated to understand. For details, please read through Key Word Command and Frequently Asked Questions of this document.

Examples

UNIX example script found in $CEXAM/unix/runnable/

  • findncs.exam
  • Key Word Commands

    General: Each line which starts with "!" or "#" will be ignored. The possible keywords are:
    CELL, COMPOUND, DISP, ERROR, FRCLIM, FSITE, LIST, MAXNCS, MINMATCH, SITE, SPACEGROUP, SPHERE, XYZLIM

    CELL <a> <b> <c> <alpha> <beta> <gamma>

    It is neccecary to have cell dimension when coordinates of HA sites are in fraction system or the HA sites have not been put into a correct asymmetric unit. This command must be given before SPACEGROUP, SITE and FSITE command.

    SPAcegroup <NAME>

    Space group name or number, e.g. P212121 or 19 If this command is not present, the program will not operate crystalloghpic symmetry and so it only works properly when you already put the HA sites in a correct assymetric unit.

    ERRor <error>

    Estimated error of heavy atom site. If the distance of two sites after a NCS operationis less than this value, the program will think these two site join the NCS. I recommend to start with 1/2 of the resultion used by difference PATTERSON/FOURIER which solved the HA sites.

    MAXNCS <maxncs>

    Example MAXNCS 20 [default: all the solutiuons] Maximium number of NCSs output in the final output files. If the command is not present, the program will output all the rankned solutions. Usually (I think in more than 99% cases), the solution(s) with highest rank are the correct one.

    COMPound <compound>

    Default compound name. The compound name of given sites followed this by comand will be asummed come fromt his compound unless otherwise specified. The name to let the program distinguish HA sites from different compound. If two heavy atom (even from different compound) can binds same site of different NCS related protein molecules but are given different compound name, the program will not use them to calculate NCS operations In this case, oportunity to find the correct NCS might be missed. But in the other hand, if two sites come from different compound by are not given the same, it might give a lot of false "noise" and so it will take longer CPU time to find the correct solution.

    FSITE <fx> <fy> <fz> [<compound>]

    Fraction coordinates and compound name
    	example: fsite   0.755   0.282   0.146 
    	     or	 fsite   0.689   0.467   0.299   PCMB 
    
    if the compound name is not give, the program will assume the name is same with the default name from COMPOUND. CELL command must be give before this command.

    SITE <x> <y> <z> [<compound>]

    Orthogonal coordinates and compound name
    	examples	site -13.752  16.271  27.267	
    
    if the compound name is not give, the program will assume the name is same with the default name from COMPOUND. See convention of the orthogonal system. If the coordinates are already put into the correct asssymetric unit, user does not have to give cell dimension and space group.

    FRCLIM <xfrclow> <xfrchigh> <yfrclow> <yfrchigh> <zfrclow> <zfrchigh>

    Fractional xyz limit for searching ranges
    	example -1. 1.  -1. 1.  -1. 1.
    
    By default, the program extends HA sites to 8 unit cells (XYZ from -1. to 1.), to make sure the the searching range includes the entire protein molecule. However, if the users have many crystallographic operations, (like cubic, hexagonal or tetregonal), it is extramely time consuming to find out in such a range and there are a lot of chances to have "noise" (false NCS). So, users have to find out a smaller searching ranges themselves (see FAQ 4.a). This command can be repeated. see also XYZLIM and SPHEre

    XYZLIM <xlow> <xhigh> <ylow> <yhigh> <zlow> <zhigh>

    Orthogonal xyz limit for searching ranges
         example -10 50  -100  40  -20  60
    
    This command can be repeated; see FRCLIM

    SPHEre <cenx> <ceny> <cenz> <radii>

    Searching sphere. Sites which are located inside a sphere with give center position and radii will be used for NCS searching . This command can be repeated. See FRCLIM.

    DISP <maxdisp> <axislength>

    <MAXDISP> maximum number of sites for graphics displaying, <axislengh> lenghth of the axis displmed by the O program. This command is for generating PDB files and some files which can be used by the O program (Jones et al 1990) to automatically display the results on graphics. For details see FAQ 3.

    MINMATCH <minmatch>

    Miniumin site match number. If you have many sites or crystallographic symmetry operations, program will find many false NCS. This command will make the output much cleaner and can also speed up the caculation.

    LIST <yes> | <no>

    If yes, some detailed information will be output before the final solution found out.

    Conventions and example output

     The program uses following convention for orthogonal system
                                 X is in A direction
                                 Y is in AB plane
                                 Z is perpendicular to AB plane
    
     Each NCS operation can be described as that an object
     rotate kappa degree about a certain axis, then move a
      screwing distance along the direction of this axis
    
     In the following for each possible NCS, the program provides:
     Matching paris: number of matching pairs by the possible NCS solutions
                      is ranked by this number
     Matching members: the ID number of each site (from input order before
                       crystallographic operation)
     RMS:  root mean square deviation of the superimposed sites
     Screw: the screwing distance
     Radii: average distance between center and all joining site
     Polar angles:  they describes the NCS axis as follow
      psi is the angle between Z and NCS axis
      phi is the angle between X and image of NCS axis XY plane
      kappa is the rotation angle. 180=2fold, 90=4-old ... so on
      the polar angle definition is same as in the POLARRFN program
     Center: center position of all the joining sites, this
            can help to find out if two axis interact each other
    
    Example
     Maximium number of NCS operations to be output          10
     Space Group  >>> P21                      4
     Symmetric operation ----      Total:   2      Rotation:  2
               8 sites were read in
              16 sites after symmetry operations
             116 sites are extended to maximium cells
    
     Building a distance matrix......
     Looking for NCS matches......
    
     Total  285 NCS operations have been found
     Maxinium atom number           6
     generating unit cell frame for O...
    ----------------------------------------------------------------------
     NCS1    with matching pairs           6
      1  2  3  4  5  7
      4  3  2  1  7  5
     NCS matrix:
        -0.99584     0.08831     0.02261
         0.08831     0.87302     0.47961
         0.02261     0.47961    -0.87719
        65.24355   -19.55399    64.35049
    RMS: 0.929     Screw:    0.00     Radii:   41.20
    Polar angle:   75.65   87.30  180.00     &   104.35  -92.70 -180.00
    Center:   32.07  -21.44   29.19
    
    ----------------------------------------------------------------------
     NCS2    with matching pairs           6
      2  3  4  5  7  8
      3  2  8  7  5  4
     NCS matrix:
        -0.98548    -0.15782    -0.06262
        -0.15782     0.71543     0.68063
        -0.06262     0.68063    -0.72995
       -39.70554   -30.00050    66.40594
    RMS: 0.993     Screw:    0.00     Radii:   54.58
    Polar angle:  111.56  -84.74  180.00     &    68.44   95.26 -180.00
    Center:  -21.63    4.36   40.89
    
    ----------------------------------------------------------------------
     NCS3    with matching pairs           6
      2  3  4  5  6  7
      3  2  6  7  4  5
     NCS matrix:
        -0.99408     0.10813     0.01057
         0.10813     0.97522     0.19301
         0.01057     0.19301    -0.98114
        67.17181    -9.37817    58.34059
    RMS: 1.197     Screw:    0.00     Radii:   40.59
    Polar angle:   84.43   86.87  180.00     &    95.57  -93.13 -180.00
    Center:   32.69  -21.11   27.57
    
    ----------------------------------------------------------------------
     NCS4    with matching pairs           5
      2  4  6  7  8
      4  2  7  6  3
     NCS matrix:
         0.88325     0.04691     0.46656
        -0.05447    -0.97798     0.20146
         0.46574    -0.20335    -0.86124
       -38.24167    57.20192    92.85443
    RMS: 0.948     Screw:  -14.65     Radii:   36.55
    Polar angle:   75.94   -0.12  167.96     &   104.06  179.88 -167.96
    Center:  -22.16   21.90   50.91
    
    ----------------------------------------------------------------------
     NCS5    with matching pairs           5
      1  2  4  6  8
      8  4  2  7  1
     NCS matrix:
        -0.99553     0.01697    -0.09289
         0.01986    -0.92409    -0.38165
        -0.09232    -0.38179     0.91963
        51.18448   -21.65644    -2.32794
    RMS: 1.292     Screw:    0.48     Radii:   56.42
    Polar angle:  168.44   76.36  179.92     &    11.56 -103.64 -179.92
    Center:   25.93   -9.40   -8.48
    .....
    CPU time:     1 min 20.5 sec
    

    Frequentely Asked Questions

    1) If I can find NCS real operation from heavy atom sites manually, is it sure the program can find it too?
    Yes!

    2) Can I directly use the NCS matrix from FINDNCS in averaging programs?
    It depends on the circumstance of protein oligomers. For example, if you only have a dimer AB in crystallography assymetric unit (asu), the program can easily find the NCS from A to B (or B to A). However, if you have tetramer ABCD, the program will give you the NCS matrix A_to_B, A_to_C, A_to_D, B_to_C, B_to_D and C_to_D, with total 6 matrices while in most averaging programs you probably only need first 3 matrix. However, your tetramer is in 222 symmetry, A_to_B/D_to_C should be the same matrix, so is A_to_C/B_to_D, A_to_D/B_to_C. The program in this case will only ouput 3 matrix which you can use in the averaging program.

    Sometimes the program does not output the NCS which you want, for example biologically, ABCD is the tetramer biologically, the program can give NCS A_to_B and A'_to_F' where A' and F' are crystallographically equivalent to A and F. You can find it when you try to make the mask before averaging. You can use SPHERE and XYZLIM or FRCLIM.

    Although FINDNCS does not always give you the exact NCS operation required by averaging programs, it is still much much faster to find out the correct NCS using the program as a tool than find NCS by hand.

    2.a) How should I analyse the output of FINDNCS?
    Look at the graphics using the outpdb files and O files.
    >From the log output.
    For example, you find the following output shows a dimer of 2-fold symmetry. The joining sites 1-fit-4 and 2-fit-3.... after the 180 deg NCS operation. (only in two fold symmetry, ste 1 fit 4 and while 4 also fit 1 in the same NCS)

    ----------------------------------------------------------------------
     NCS1    with matching pairs          12
      1  2  3  4  5  6  7  8  9 10 11 12
      4  3  2  1  8  7  6  5 12 11 10  9
     NCS matrix: 
        -1.00000    -0.00061     0.00025
        -0.00061     0.70840    -0.70581
         0.00025    -0.70581    -0.70840
         0.74407    -0.21214    -0.51413
    RMS: 0.091     Screw:    0.00     Radii:   30.48
    Polar angle:   67.55  -89.98  180.00     &   112.45   90.02 -180.00
    Center:    0.37   -0.01   -0.30
    If you find the following NCS in the same time,
    ----------------------------------------------------------------------
     NCS2    with matching pairs          12
      1  2  3  4  5  6  7  8  9 10 11 12
      2  1  4  3  6  5  8  7 10  9 12 11
     NCS matrix: 
    ....
    RMS: 0.210     Screw:    0.00     Radii:   30.48
    Polar angle:  157.55  -89.46  180.00     &    22.45   90.54 -180.00
    Center:    0.37   -0.01   -0.30
    
    ----------------------------------------------------------------------
     NCS3    with matching pairs          12
      1  2  3  4  5  6  7  8  9 10 11 12
      3  4  1  2  7  8  5  6 11 12  9 10
     NCS matrix: 
    ....
    RMS: 0.210     Screw:    0.00     Radii:   30.48
    Polar angle:   89.82    0.10  180.00     &    90.18 -179.90 -180.00
    Center:    0.37   -0.01   -0.30
    
    >From the matching site number, you can find this is a perfect 222 NCS symmetry only from the output. Of course it will be much easier to understand if you look at graphics using the PDB files (and O files if you use O).

    3) How to display the results by graphics?

    If the DISPLAY command is present, the program will generate some files for graphics display. First, it generates a PDB file ncsall.pdb which include all the sites within the searching range. After NCSs have been find out, the program will generate ncs1.pdb ncs2.pdb ..... which includes sites joning operations of NCS1, NCS2 ... and so on. You can use any graphics program to display them.

    In the case you use O and/or RAVE:
    The program also generates a file called ncs.ofm which include NCS operaters, and vectors which can be used by the O program. Then program will make an O macro file called oncs.mac (together with ncs1.mac, ncs2.mac...). If users run O under the same directory and type @oncs.mac, there will be a group of commands appearing in menu bar (@ncs1.mac, @ncs2.mac). If user want to display first NCS operations found by FINDNCS, click @ncs1.mac the O program will display a axis and the sites which join the NCS in yellow and sites superimposed by this NCS. After runing @oncs.mac The NCS matrix is stored as
    .lsq_rt_ncs1 .lsq_rt_ncs2 .... If you have a bones, you can display as two objects (e.g. SKEL and SKEL1) If you want to see how the 1st NCS works type:
    lsq_obj ncs1 SKEL to see how it is superimpose to SKEL1 If you have a electron density call map1, you can use command:
    lsq_rt_obj .lsq_rt_ncs1 map1 to superimpose this map. If you edit the file oncs.mac to take out .lsq_rt_ncs1... to a new file, it can be directly used by the RAVE packages.

    4) What should I do in the case the program takes an untolerable CPU time?
    The time of the program is proportional to N*N*N*N*(N-1), where N is the total number of searhing sites after crystallographic operations. If this number is more than 300, the calculation will become very slow (10 hours in a DEC-alpha for 310 atoms, so about 300 hours for 620 atoms). If users use an automatic mode for XYZ limit (8 unit cells XYZ from -1 to 1), this number can be found in the follolwing line
    736 sites are extended to maximium cells

    In this case users have to use smaller searching range defined by commands FRCLIM, XYZLIM or SPHERE. (The sites number inside selected range can be found in the line: 148 sites are inside the selection ranges )
    Increase MINMATCH is also way to descrease a lot of the CPU time when there are two many possible solutions are found. If the program output something like this
    Solution: 1000 Max match: 16
    i1,i2,i3,j1,j2,j3 1 83 91 57 115 67 CPU time: 103.0 s
    Solution: 2000 Max match: 16
    i1,i2,i3,j1,j2,j3 4 47 76 58 36 85 CPU time: 379.2 s
    You find out maximium matching number is 16 so put minmatch to 6 should not hurt anything.

    4.a) How should I define the searching range?
    The idea of defining a searching range is to include at least one entire protein oligomer so the program would not miss the correct NCS. If a searching range includes 8 crystallographic ASU homogeneously in XYZ directions, I think (not proved) an entire protein oligomer won't be missed in more than 95% cases.

    It does not matter if the searching range include more than one entire protein oligomers. (The program should be able to recogonize the crystallographic symmetric equivalent NCSs by finding the joining sites of these NCSs are crystallographic equivalent too.) However, too big searching range might make many sites to be searched and slow down the searching. It is not very practical if the sites inside the range are more than 300.

    There are several way you can decide the searching range.

    • According the knowledge of ASU in a particular space group.
      It is quite complicated to find out a searching range for a certain spacegroup to garantee an entire protein oligomer is included . At this point, the author has not been clear for all the spacegroups
      I only recommend for some individual cases :
      --In triclinic, monoclinic, P3x, and R3 space groups, user can use the default searching range of the program.
      -- Space group P2x2x2x, (-0.5,1. -0.5.1, -0.5,1)
      -- Space group P4x2x2, (0. 1. 0. 1. 0. 1)
      Klas Anderson pointed out that the search should be performed according to the corresponding Cheshire group of the space group. This is tabulated by Hirschfeld Acta Cryst. A (1968) and in Int. Tables of Cryst. (I have not checked myself) If any one has more knowledge about this, please tell me.
    • Use LIST yes and DISPLAY Option
      You can use default searching range first. The "LIST yes" would list all the NCS it finds. Since there are a lot of repetitions in the default searching range, the correct one must appear after the program running for a while (10-60 minitues CPU prohaps). If your log file is called findncs.log, you can type command:
      "grep MATCH findncs.log | sort +6" and you will get something like
       MATCH #           3 with matching pairs           5
       MATCH #           4 with matching pairs           5
       MATCH #          15 with matching pairs           5
        ........
       MATCH #        1585 with matching pairs          12
       MATCH #        1569 with matching pairs          13
       MATCH #         681 with matching pairs          14
       MATCH #        1070 with matching pairs          14
      
      Now you know the best matching is MATCH 681 so far. You open the file findncs.log and find the MATCH 681 and you can find NCS there. The atom file ncsall.pdb has been generate and you can use graphic to see what this NCS looks like and choose the searching range round the joining sites.
    • According the bones of the electron density or solvent flattening mask
      Even if you don't want to use averaging technique, you have to look at the electron density and bones and decide which crystallographic ASU you are going to use. This ASU should certainly include at least one protein oglimer. You do not have to operation the heavy atom sites to this ASU. If you just express the range by XYZLIM, FRCLIM or SPHERE, the program should find out the NCS very quickly.

    Acknowledgement

    The author appreciates Professors Gunter Schneider and Ylva Lindqvist for pointing out the possible impact and encouraging me to make the program, Drs. Cristofer Enroth and Ylva Lindqvist for providing test examples.