Creating a template and mask - SBC-Utrecht/PyTom GitHub Wiki
Creating a template
You can create a template from an atomic model (pdb/cif) or from an electron microscopy reconstruction (from the emdb for example). In our experience EM maps give the best results, so we advise that option. Template creation needs to be run from the command line with the script ‘create_template.py’. The script, importantly, convolves the template with a CTF which improves template matching performance.
From atomic model
Simulating from electrostatic potential consists of structure preparation, sampling the atomic coordinates to an electrostatic potential, convoluting with the CTF, low-pass filtering and down-sampling to the pixel size of the tomogram. All of these steps can be done within the create_template.py script. However the structure preparation relies on chimera to be available on the command line (not chimerax!), so to be safe you can do this step yourself in chimera/chimerax.
Structure preparation in Chimera
Load the map in chimera and run the following commands:
delete solvent remove water molecules
delete ions remove ions (not always preferred, depends on whether there are ions in the surrounding solvent)
addh add hydrogen atoms
Add biologically relevant symmetry (in case your structure has this), in Chimera:
sym group biomt
Or in ChimeraX:
sym #1 biomt (where #1 refers to the id of the loaded module in chimera, this might differ)
Save the structure to a new file.
Structure preparation within the script
If you want to run the structure preparation within create_template.py, chimera needs to be available from the command line, and the --modify-structure option needs to be added.
Simulating the template
The command is akin to this:
create_template.py -f 4ug0.cif -d . --modify-structure --solvent-correction masking --solvent-density 0.93 -s 1.724 -b 8 -c -z 3 -v 200 --cut-first-zero -l 40 -x 28 (-g 0) (--cores 8)
-fstart from an unmodified atomic structure file (cif format)-dwrite to current folder--modify-structureprepare the structure in chimera from the command line--solvent-correction maskinguse masking to calculate the solvent exclusion around the protein, this attempts to simulate that the structure is not in a vacuum. Alternatively, the correction can be set togaussian, which is less aggressive but also less accurate. With masking its important the that the provided pixel size (-s) of the simulation is smaller than 2A.--solvent-density 0.93set the solvent density to the default value of 0.93. For particles in a high background environment (such as cytosol) you can increase the density to reduce the contrast of your template and attempt to mimick experimental conditions (but never higher than protein density 1.35. The feature is experimental, so the advice is to leave at the default setting.-sand-btogether give the output pixel size of the template, in this case 1.724 * 8 = 13.79A. If you want to sample to 20A, you can give-s 2.5 -b 8and the resulting size is 2.5 * 8. However, if you want to increase the sampling in the simulation you can always divide the pixel size by 2 and multiply the binning with 2, here we could set it to-s 0.862 -b 16to increase the sampling. Runtime will also increase of course.-cconvolute the template with a CTF-zspecifies the defocus of the ctf, here 3 um-vthe voltage of electron beam--cut-first-zerotells the script to cut the ctf after the first zero crossing, the ctf simulation generally becomes unreliable after the first ctf due to the defocus gradient at high tilt angles- (
-aand--Cscan be used to set the amplitude contrast and spherical aberration for the ctf, but here they are left at their default of 0.07 and 2.7 - a Gaussian
--decayfor the ctf can also be specified (akin to the MTF of the detector) but there is also another option to low pass filter)
-ltells to the low-pass filter to 40A. Because the first zero crossing of the specified CTF is at 40A , we also tell the script to low-pass filter to this resolution. If not specified the script will low pass filter to the Nyquist resolution of the final box before down-sampling the pixels, which would here be 2 * (1.724 * 8) = 28A-xsets the final box size of the template to 28 pixels, the box will be square 283. The box size should generally be chosen as tight as possible because it will better match particles close to the tomogram edge and also save some time when template matching is ran over multiple processes. However, it also needs to be large enough to encompass the mask with smoothed edges completely.-gcan be used to specify a gpu to run the program on, will provide a good speed up for sampling atoms but for small pixel sizes the sampled box might become to large for GPU memory--coresinstead you can set more cpu cores to sample the electrostatic potential to larger boxes
figure: Example of a human 80S ribosome template generated from pdb file 4ug0.
From EM map
Simulating from an EM map consists of convoluting with a CTF, low-pass filtering, and down-sampling to the pixel size of the tomogram.
I used an example file: emd_2938.map (an human 80S ribosome SPA reconstruction).
Files from the emdb are often in the .map format. First make a copy of the file with the mrc extension.
cp emd_2938.map emd_2938.mrc
Then read the header to find the voxel size of the map:
headerPyTom emd_2938.mrc
We get back that the reconstruction is sampled on 1.1A pixels.
Now you can create the template from this map. You need to know the rough ctf parameters of the dataset and the pixel size of the tomograms that you want to search for the template. Run a command akin to the following:
create_template.py -f emd_2938.mrc -d . --map-spacing 1.1 -s 1.724 -b 8 -c -z 3 -v 200 --cut-first-zero -l 40 -x 28 (-g 0)
-fspecifies the input file-dthe location to write the ouput, in this case ‘.’, i.e. the current directory--map-spacingis the pixel size of the .mrc file, in this case 1.1A-sand-btogether give the output pixel size of the template, in this case 1.724 * 8 = 13.79A. If you want to sample to 20A, you can give-s 2.5 -b 8and the resulting size is 2.5 * 8 = 20A.- Script was written with a integer binning in mind (but this is not always done anymore). The main point of
specifying the initial pixel size (
-s) still serves the point of oversampling the ctf function which helps the simulation.
- Script was written with a integer binning in mind (but this is not always done anymore). The main point of
specifying the initial pixel size (
-cconvolute the template with a CTF-zspecifies the defocus of the ctf, here 3 um-vthe voltage of electron beam--cut-first-zerotells the script to cut the ctf after the first zero crossing, the ctf simulation generally becomes unreliable after the first ctf due to the defocus gradient at high tilt angles- (
-aand--Cscan be used to set the amplitude contrast and spherical aberration for the ctf, but here they are left at their default of 0.07 and 2.7 - a Gaussian
--decayfor the ctf can also be specified (akin to the MTF of the detector) but there is also another option to low pass filter)
-ltells to the low-pass filter to 40A. Because the first zero crossing of the specified CTF is at 40A , we also tell the script to low-pass filter to this resolution. If not specified the script will low pass filter to the Nyquist resolution of the final box before down-sampling the pixels, which would here be 2 * (1.724 * 8) = 28A-xsets the final box size of the template to 28 pixels, the box will be square 283. The box size should generally be chosen as tight as possible because it will better match particles close to the tomogram edge and also save some time when template matching is ran over multiple processes. However, it also needs to be large enough to encompass the mask with smoothed edges completely.-gcan be used to specify a gpu to run the program on--coresmultiple cpu cores wont affect the calculation from an EM map
Creating a mask
Spherical mask from the GUI
Navigate to single template matching tab: Particle Picking -> Template Matching -> Single -> press Template Matching check box
Set the following:
- the box size should be equal along each dimension
- the mask radius (in pixels) within the box. Generally you want to give some overhang relative to you particle diameter. For a ribosome with 290A diameter, the mask diameter could be around 330A. If the template is sampled on 13.79A pixels, this is 330/13.79 ~= 24 pixels => mask radius should then be 12 pixels.
- a smoothing around the mask, generally 0.1 or 0.05 of the mask radius
Here I gave the following settings: box size=28, radius=12, smooth=0.5
Press Create and choose a location to save the mask!
figure: Example of a spherical mask for template matching. box size=28, radius=12, smooth=0.5
Spherical or Ellipsoidal mask from the command line
The mask can also be generated from the command line which adds the option of creating an ellipsoidal instead of purely spherical mask. Run the create_mask.py script for this.
We can create a spherical mask identical to the GUI by running: create_mask.py -o mask.mrc -b 28 -r 12 -s 0.5.
For an ellipsoidal mask the box dimensions, radius, and smoothing can be set similar to the spherical mask in the GUI. However, here you need to play around with the --minor and --minor2 axis radii of the mask to get an ellipsoidal mask. For example:
create_mask.py -o ./mask.mrc -b 28 -r 12 --minor 6 --minor2 8 -s 0.5
The mask is probably not oriented correctly to your particle. To fix this, you can open the mask and template in chimera, rotate the mask correctly with the template, and finally use the vop resample command from Chimera to sample the mask on the grid (onGrid) of the template.