FREERFLAG (CCP4: Supported Program)
NAMEfreerflag - tags each reflection in an MTZ file with a flag for cross-validation
freerflag HKLIN foo.mtz HKLOUT foo_out.mtz
This program is part of the uniqueify script (see documentation for UNIQUE) and is used to tag each reflection with a flag. It is strongly recommended that the reflection list for tagging is generated by UNIQUE before FREERFLAG is run; uniqueify does both these steps automatically. The resulting reflection file will contain all possible (h k l) for the structure spacegroup and for the defined resolution. This file will be used for refinement and the tagged reflections used for the calculation of `Free R factors' (reference ).
This master list of FreeR assignments can then be transferred to any new data sets, or to isomorphous data sets such as substrate complexes. This is important if you plan to start refinement against new data using the previously refined model (as we all do!), or if you are combining different methods of refinement. In these cases it is essential to tag the SAME reflections.
This can be done by generating an mtz file with FreeR flags (the uniqueify script is recommended), then using the program MTZ2VARIOUS to convert it to any other (non-CCP4) format with the appropriate flag. These formats use different conventions to indicate the free and working sets:
Conversion from other (non-CCP4) formats requires the use of F2MTZ to convert the original file to an mtz file, which can then be extended to fit the CCP4 convention. See examples for XPLOR CNS SHELX or TNT input. The program FREERFLAG recognises the different conventions and automatically transforms the flags into the ccp4 convention (see table above).
By default the FreeR_flag for each reflection is 0, 1, 2 etc., randomly and uniformly distributed reflexion-by-reflexion so that each value occurs (on average) in a fraction of the data specified by the FREERFRAC keyword. Under the CCP4 convention, the free set is assigned a FreeR_flag = 0, and the working set is assigned a flag between 1 and (n-1) where n = 1/fraction.
(Note that it is no longer possible to generate flags under the old system where the FREE percentage have the flag 0, and the rest of the data is flagged 1, and the OLDFREE keyword which used to allow this is now obsolete.)
This means that it is possible to select different blocks of reflections for exclusion, using a preset `exclusion flag'. The selected value should be held constant throughout a complete refinement run. For density modification and other procedures which need full `cross validation' (reference ) it may be useful to be able to vary the FREE set. WARNING - do NOT change the selected set casually!
If during any calculation (e.g. refinement, map calculation or agreement analysis) the program label assignment `FREE=FreeR_flag' is made, reflections which are flagged with the chosen value (default 0) are excluded from the calculation. For instance, during refinement this means that the agreement between their FP and the Fc is independent of the refinement procedure. The Free R factor calculated for these reflections is a useful indicator of the quality of the refinement, especially when there is a shortage of observations and the structure is underdetermined.
Treatment of systematic absences
Systematically absent reflections (if present) are treated like other reflections, and are also assigned a freeR flag. This is different to the behaviour of previous versions of the program, where systematic absences were flagged as "missing" by the FreeR_flag.
All the possible keywords are optional but if you wish to retain the existing freeR flags then COMPLETE must be given.Keywords are: FREERFRAC, SEED, COMPLETE, END
The OLDFREE keyword is now obsolete and has no function.
A <fraction> of all reflections in the file is flagged with a given value (`indicator') in the FreeR_flag column. The indicators will range from 0 to int(1.0/<fraction>)-1. <fraction> defaults to 0.05 and therefore the indicators will range from 0 to 19.
By default, for a given job on a given machine, the random number generator produces the same list of "random" free-R flags each time the job is run. Since you would generally only produce one list of free-R flags for each project, this is not usually a problem. However, if you specify the keyword SEED, then the random number generator is seeded with the current time, and will produce a different list of free-R flags each time the job is run.
This option will complete an existing list of FREE flags when extending the indices. If a FREE value is present in the file in <column> it is carried through for output; if the FREE <column> isn't present for a given reflection a value is given a value using the standard random number generation.
The other keywords are ignored when COMPLETE is specified. The fraction of data per bin is taken from the highest value of the freeR flag. If the file has an old style freeR (i.e. 0 or 1) then the output MTZ has the same format. The fraction of data flagged as free would then be calculated from the existing reflections. This fraction maybe not be exactly the same as the one you used originally because of statistical variations. See the example.
FREERFLAG is normally run as part of the uniqueify script, examples of which are:
Examples of running FREERFLAG on its own can be found at: