Creating a template and mask - SBC-Utrecht/PyTom GitHub Wiki

Creating a template

You can create a template from an atomic model (pdb/cif) or from an electron microscopy reconstruction (from the emdb for example). In our experience EM maps give the best results, so we advise that option. Template creation needs to be run from the command line with the script ‘create_template.py’. The script, importantly, convolves the template with a CTF which improves template matching performance.

From atomic model

Simulating from electrostatic potential consists of structure preparation, sampling the atomic coordinates to an electrostatic potential, convoluting with the CTF, low-pass filtering and down-sampling to the pixel size of the tomogram. All of these steps can be done within the create_template.py script. However the structure preparation relies on chimera to be available on the command line (not chimerax!), so to be safe you can do this step yourself in chimera/chimerax.

Structure preparation in Chimera

Load the map in chimera and run the following commands:

delete solvent remove water molecules

delete ions remove ions (not always preferred, depends on whether there are ions in the surrounding solvent)

addh add hydrogen atoms

Add biologically relevant symmetry (in case your structure has this), in Chimera:

sym group biomt

Or in ChimeraX:

sym #1 biomt (where #1 refers to the id of the loaded module in chimera, this might differ)

Save the structure to a new file.

Structure preparation within the script

If you want to run the structure preparation within create_template.py, chimera needs to be available from the command line, and the --modify-structure option needs to be added.

Simulating the template

The command is akin to this:

create_template.py -f 4ug0.cif -d . --modify-structure --solvent-correction masking --solvent-density 0.93 -s 1.724 -b 8 -c -z 3 -v 200 --cut-first-zero -l 40 -x 28 (-g 0) (--cores 8)

-f start from an unmodified atomic structure file (cif format)
-d write to current folder
--modify-structure prepare the structure in chimera from the command line
--solvent-correction masking use masking to calculate the solvent exclusion around the protein, this attempts to simulate that the structure is not in a vacuum. Alternatively, the correction can be set to gaussian, which is less aggressive but also less accurate. With masking its important the that the provided pixel size (-s) of the simulation is smaller than 2A.
--solvent-density 0.93 set the solvent density to the default value of 0.93. For particles in a high background environment (such as cytosol) you can increase the density to reduce the contrast of your template and attempt to mimick experimental conditions (but never higher than protein density 1.35. The feature is experimental, so the advice is to leave at the default setting.
-s and -b together give the output pixel size of the template, in this case 1.724 * 8 = 13.79A. If you want to sample to 20A, you can give -s 2.5 -b 8 and the resulting size is 2.5 * 8. However, if you want to increase the sampling in the simulation you can always divide the pixel size by 2 and multiply the binning with 2, here we could set it to -s 0.862 -b 16 to increase the sampling. Runtime will also increase of course.
-c convolute the template with a CTF
- -z specifies the defocus of the ctf, here 3 um
- -v the voltage of electron beam
- --cut-first-zero tells the script to cut the ctf after the first zero crossing, the ctf simulation generally becomes unreliable after the first ctf due to the defocus gradient at high tilt angles
- (-a and --Cs can be used to set the amplitude contrast and spherical aberration for the ctf, but here they are left at their default of 0.07 and 2.7
- a Gaussian --decay for the ctf can also be specified (akin to the MTF of the detector) but there is also another option to low pass filter)
-l tells to the low-pass filter to 40A. Because the first zero crossing of the specified CTF is at 40A , we also tell the script to low-pass filter to this resolution. If not specified the script will low pass filter to the Nyquist resolution of the final box before down-sampling the pixels, which would here be 2 * (1.724 * 8) = 28A
-x sets the final box size of the template to 28 pixels, the box will be square 283. The box size should generally be chosen as tight as possible because it will better match particles close to the tomogram edge and also save some time when template matching is ran over multiple processes. However, it also needs to be large enough to encompass the mask with smoothed edges completely.
-g can be used to specify a gpu to run the program on, will provide a good speed up for sampling atoms but for small pixel sizes the sampled box might become to large for GPU memory
--cores instead you can set more cpu cores to sample the electrostatic potential to larger boxes

figure: Example of a human 80S ribosome template generated from pdb file 4ug0.

From EM map

Simulating from an EM map consists of convoluting with a CTF, low-pass filtering, and down-sampling to the pixel size of the tomogram.

I used an example file: emd_2938.map (an human 80S ribosome SPA reconstruction).

Files from the emdb are often in the .map format. First make a copy of the file with the mrc extension.

cp emd_2938.map emd_2938.mrc

Then read the header to find the voxel size of the map:

headerPyTom emd_2938.mrc

We get back that the reconstruction is sampled on 1.1A pixels.

Now you can create the template from this map. You need to know the rough ctf parameters of the dataset and the pixel size of the tomograms that you want to search for the template. Run a command akin to the following:

create_template.py -f emd_2938.mrc -d . --map-spacing 1.1 -s 1.724 -b 8 -c -z 3 -v 200 --cut-first-zero -l 40 -x 28 (-g 0)

-f specifies the input file
-d the location to write the ouput, in this case ‘.’, i.e. the current directory
--map-spacing is the pixel size of the .mrc file, in this case 1.1A
-s and -b together give the output pixel size of the template, in this case 1.724 * 8 = 13.79A. If you want to sample to 20A, you can give -s 2.5 -b 8 and the resulting size is 2.5 * 8 = 20A.
- Script was written with a integer binning in mind (but this is not always done anymore). The main point of specifying the initial pixel size (-s) still serves the point of oversampling the ctf function which helps the simulation.
-c convolute the template with a CTF
- -z specifies the defocus of the ctf, here 3 um
- -v the voltage of electron beam
- --cut-first-zero tells the script to cut the ctf after the first zero crossing, the ctf simulation generally becomes unreliable after the first ctf due to the defocus gradient at high tilt angles
- (-a and --Cs can be used to set the amplitude contrast and spherical aberration for the ctf, but here they are left at their default of 0.07 and 2.7
- a Gaussian --decay for the ctf can also be specified (akin to the MTF of the detector) but there is also another option to low pass filter)
-l tells to the low-pass filter to 40A. Because the first zero crossing of the specified CTF is at 40A , we also tell the script to low-pass filter to this resolution. If not specified the script will low pass filter to the Nyquist resolution of the final box before down-sampling the pixels, which would here be 2 * (1.724 * 8) = 28A
-x sets the final box size of the template to 28 pixels, the box will be square 283. The box size should generally be chosen as tight as possible because it will better match particles close to the tomogram edge and also save some time when template matching is ran over multiple processes. However, it also needs to be large enough to encompass the mask with smoothed edges completely.
-g can be used to specify a gpu to run the program on
--cores multiple cpu cores wont affect the calculation from an EM map

Creating a mask

Spherical mask from the GUI

Navigate to single template matching tab: Particle Picking -> Template Matching -> Single -> press Template Matching check box

Set the following:

the box size should be equal along each dimension
the mask radius (in pixels) within the box. Generally you want to give some overhang relative to you particle diameter. For a ribosome with 290A diameter, the mask diameter could be around 330A. If the template is sampled on 13.79A pixels, this is 330/13.79 ~= 24 pixels => mask radius should then be 12 pixels.
a smoothing around the mask, generally 0.1 or 0.05 of the mask radius

Here I gave the following settings: box size=28, radius=12, smooth=0.5

Press Create and choose a location to save the mask!

figure: Example of a spherical mask for template matching. box size=28, radius=12, smooth=0.5

Spherical or Ellipsoidal mask from the command line

The mask can also be generated from the command line which adds the option of creating an ellipsoidal instead of purely spherical mask. Run the create_mask.py script for this.

We can create a spherical mask identical to the GUI by running: create_mask.py -o mask.mrc -b 28 -r 12 -s 0.5.

For an ellipsoidal mask the box dimensions, radius, and smoothing can be set similar to the spherical mask in the GUI. However, here you need to play around with the --minor and --minor2 axis radii of the mask to get an ellipsoidal mask. For example:

create_mask.py -o ./mask.mrc -b 28 -r 12 --minor 6 --minor2 8 -s 0.5

The mask is probably not oriented correctly to your particle. To fix this, you can open the mask and template in chimera, rotate the mask correctly with the template, and finally use the vop resample command from Chimera to sample the mask on the grid (onGrid) of the template.