CCP4 Interface: Data Reduction Module

	CCP4i: Graphical User Interface
	Data Reduction Module

MTZ Files

Import Scaled Denzo and d*TREK Data

Import Unscaled Data - Combat

Sort/Reindex MTZ Files - Sortmtz/Rebatch/Reindex

Scale Experimental Intensities - Scala: Scala - Task Window Layout; Scala - Datasets and Output Files

Treat Twinned Data - Detwin

Convert Intensities to Structure Factors - Truncate: Truncate - Task Window Layout

'Uniqueify' Data

This module contains the following tasks:: Import Scaled Data; Import Unscaled Data; Sort/Reindex MTZ Files; Scale Experimental Intensities; Convert Intensities to SFs; Treat Twinned Data

Specialist Help is available on:: Reindexing; Twinning

The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.

MTZ Files

Data which has not been scaled and merged is saved in a 'multi-record' MTZ file. The multi-record file contains several columns which do not appear in a standard MTZ file, for example M/ISYM BATCH. The Scala program does the scaling and merging of data and outputs a standard MTZ file. Data imported from Mosflm is in a 'multi-record' MTZ format but any other imported unscaled data will need to be converted to the MTZ multi-record format. Imported scaled data is converted directly to standard MTZ format.

Data columns within the standard MTZ file are labelled with a 'project name' and a 'dataset name'. The dataset name should be used to distinguish data from native or derivative structures within the MTZ file. When importing data or converting to the standard MTZ format you will be required to provide a dataset name for the data. You are strongly recommended to use this facility carefully, both to help you keep your data organised but also because some 'down-stream' programs such as SCALEIT (Scale Datasets in the Experimental Phasing module) require that input MTZ files have consistent dataset naming.

Import Scaled Denzo and d*TREK Data

Data from Denzo or d*TREK are usually already scaled. In this case, the files must be converted to standard MTZ format, structure factors generated from the intensities (see below), and the data passed through the 'uniqueify' process (see below). All these steps can be performed from the Import Scaled Data task interface.

See program documentation: SCALEPACK2MTZ, dTREK2MTZ.

Importing Unscaled Data

The most common imported unscaled data is probably from Mosflm which is already in the multi-record MTZ format and so can be used directly in the Scale Experimental Intensities task. But other, less likely, formats can be imported through this task and then used in Scale Experimental Intensities.

It is also possible to 'import' standard MTZ format files for conversion to the multi-record format.

See program documentation: Combat.

Sort/Reindex MTZ Files

There are some editing functions which you may need to apply to multi-record MTZ files prior to scaling - particularly changing the space group or reindexing might be necessary if the initial reflection indexing is suspect. There is also an option to reset the batch numbers for sets of reflections. This should only be necessary if they were wrongly or not recorded at the time of data collection or if you suspect that some sets of reflections need special treatment by Scale Experimental Intensities and need identifying as a separate batch.

Note that there is an option in the Reflection Data Utilities to reindex standard MTZ files.

See program documentation: Sortmtz, REINDEX, Rebatch.

Scale Experimental Intensities - Scala

The Interface for Scala is quite large. Many of the options are only needed if detailed optimisation of the scaling is required. For this, the program documentation of Scala gives numerous hints, which will be incorporated in the Task Window Layout section below.

By default, data which has been scaled and merged by Scala is then converted from intensities to structure factures (see below) and usually passed through the Uniqueify process (see below). The data is then in a standard MTZ format and suitable for input to molecular replacement or experimental phasing.

Scala is one of the Data Harvesting programs. See Data Harvesting in CCP4i.

Scala - Task Window Layout

Features to look out for in the Scala Task are:

Folder title Importance Comment

Protocol Run Truncate Request this to get structure factor amplitudes output in addition to the scaled intensities

Output a single MTZ file Only available when running Truncate If there are several output datasets then by default there will one output file for each dataset (see below).
This option runs Cad to collect all the data into a single output file

Ensure unique data & add Free R column Runs the Uniqueify procedure - if there are multiple output files then the Free R assigned to the first file will be copied to the rest

Convert to SFs & Wilson Plot Use [...] as identifier to append to column labels Only available when running Truncate By default the ouput dataset name will be appended to the MTZ column labels output by Truncate. Alternatively the user can choose to set their own identifiers for each dataset

Define Output Datasets Lists the dataset definitions which are passed to the output file. The exact contents of this folder depend on the dataset information contained in the input file, and the particular mode that Scala is being run in - see below

See program documentation: Scala

Scala - Datasets and Output Files

Scala deals with batch MTZ files based on the dataset information which is contained in the input file. In the default mode dataset information present in the input file is automatically carried through to the output file. If dataset information is absent then it is necessary to define it before running the task.

If the Scala job is split into several runs (``multi-runs'') then it is possible to (re)assign the output of each run to an output dataset. In this case several output datasets can be defined and these are not dependent on the input datasets.

Scala will produce separate output MTZ files for each output dataset, however these can also be merged automatically into a single file after running Truncate on each to generate structure amplitudes. The table below summarises the output files from the Scala task based on the protocol used in running the task.

Number of Input Datasets	Multi-runs	Merge after Truncate	Output
None/One	No	N/A	Single MTZ file
None/One	Yes	No	One MTZ file for each output dataset
None/One	Yes	Yes	Single MTZ file containing all output datasets
Two or more	No	No	One MTZ file for each input dataset
Two or more	No	Yes	Single MTZ file containing all output datasets
Two or more	Yes	No	One MTZ file for each output dataset
Two or more	Yes	Yes	Single MTZ file containing all output datasets

Treat Twinned Data - Detwin

The detwin program will either analyse the data to determine the twinning fraction or generate detwinned data.

See program documentation: Detwin

Convert Intensities to Structure Factors - Truncate

The program Truncate is used to obtain structure factor amplitudes from intensities. This conversion is performed by default when importing scaled data or running Scala (Scale Experimental Intensities). There is an explicit interface to the Truncate program which includes some less commonly used options. By default, if you use this task interface then the data will also be passed through the Uniqueify process (see below).

Truncate is one of the Data Harvesting programs. See Data Harvesting in CCP4i.

See program documentation: Truncate

'Uniqueify' Data

By default this process is applied to all data after it has been converted to structure factors in any of the tasks which import scaled data or scale the data. The process will ensure that all reflections are unique and will 'fill in' any missing reflections. It will add a column of FreeR indicators or optionally import the FreeR column from an existing MTZ file. The data can also optionally be extended to higher resolution (this is recommended if you expect to have data to higher resolution in future and should set up a full FreeR set from the start).

If you are processing derivative data and you want to add it to the MTZ file for the native data, you should NOT run the Uniqueify options but use the Merge Datasets task in the Experimental Phasing module.