Data stream simulator - duaneloh/Dragonfly GitHub Wiki

This page describes the various programs involved in setting up a simulated reconstruction.

sim_setup

The convenience utility, sim_setup.py, helps to streamline the data stream simulator. By default it simulates a single-particle diffraction data stream using the parameters listed in config.ini listed in the reconstruction directory. This utility calls the modules below sequentially to generate the data stream. The EMC reconstruction module should be started separately after this setup has been called.

For more information on sim_setup.py, run the command

./sim_setup.py -h

Here's a flowchart of how the data stream simulator works. It basically takes PDB and config.ini files and produces all that the emc reconstruction algorithm needs!

https://github.com/duaneloh/Dragonfly/blob/master/images/emc_sim.png

Here is a breakdown of the elements in the data stream simulator.


make_densities

Create electron density map from PDB file. More here.

Usage

Use default config.ini file in the run directory:

./make_densities.py 

Use custom config file:

 ./make_densities.py -c [path_to_custom_config.ini]

Get more help:

 ./make_densities.py -h

Input (specified in config.ini)

in_pdb_file = aux/4BED.pdb

scatt_dir = aux/henke_table (energy dependent atomic scatting factors)

Output (specified in config.ini)

out_density_file = data/densityMap.bin

Data Format

densityMap.bin is a flattened binary record of the floating point numbers of a 3D cubic electron density distribution in real space.


make_intensities

Create 3D intensity map from electron density. More here.

Usage

Use default config.ini file in the run directory:

./make_intensities.py 

Use custom config file:

./make_intensities.py -c [path_to_custom_config.ini]

Get more help:

./make_intensities.py -h

Input (specified in config.ini)

Use the output density file from make_densities.py:

in_density_file = make_densities:::out_density_file

Output (specified in config.ini)

out_intensity_file = data/intensities.bin

Data Format

intensities.bin is a flattened binary record of the floating point numbers of a 3D cubic intensity distribution in reciprocal space.


make_detector

This Python module generates the detector geometry file which contains information about each detector pixel. More here.

Usage

./make_detector.py <path_to_config_file>

Output (specified in config.ini)

[make_detector]

out_detector_file = data/det_sim.dat

Data Format

The detector geometry file is an ASCII (human-readable) file.

  • The first line contains a single integer which is the number of pixels
  • From the second line on, there are 5 columns per line. Each line represents information about a pixel. The pixel index in the rest of this package refers to the line number in this file.
  • The first three columns give the 3D reciprocal space coordinates of the pixel relative to the 3D intensity model.
  • The fourth column is a multiplicative factor for solid angle and polarization (x or y direction) corrections for the pixel.
  • The fifth column is a mask value. There are three types of pixels:
    • 0: Good pixels. Assumed to have accurate photon information and will be used to determine the orientation.
    • 1: Non-orientationally relevant pixels. They have accurate photon information but are not used to determine the orientation. These are generally pixels at the corners of the detector, which sample regions of reciprocal space with very low multiplicity. They are excluded to avoid overfitting of orientations.
    • 2: Bad pixels. These pixels are completely ignored. In principle, they can be removed from the detector geometry file completely. This mechanism is in place if one wants space-filling detector shapes for convenience.

make_data

This module generates the photon data from a simulated electron density subject to realistic elastic x-ray scattering. More here.

Usage

Use default config.ini file in the run directory:

./make_data

Use custom config file and custom number of threads:

./make_data  -c [path_to_custom_config.ini] -t [number of openMP threads]

Input (specified in config.ini)

Refinement level of the rotation group:

num_data = number of data fluence = photons per um^2 in_detector_file = make_detector:::out_detector_file

Output (specified in config.ini)

out_photons_file = data/photons.emc

Data Format

photons.emc is a human sparse binary file. Since the photon data in many high-resolution SPI experiments expect few photons per pattern, a sparse binary format is used to store the data. Hence, for each pattern we only store information about pixels that receive photons. Additionally, since most of the non-zero counts are ones, only their pixel locations are stored. For pixels receiving two or more photons, we store both their pixel location and photon count.

The data in the photon file are arranged in six blocks (see figure below). The file’s header resides in the first block, which is 1024 bytes long. This begins with two 4-byte chunks: a 32-bit integer describing the number of patterns (num data) contained in the file, followed by another 32-bit integer for the number of pixels in each pattern. The next 1016 bytes are currently empty (filled with zeros).

The second block contains num data 32-bit integers giving the number of one-photon events in each pattern (ones). The third block contains num data integers giving the number of multi-photon events (multi). The total number of single photon events in all the patterns is the sum of all numbers in the ones array (So). Similarly, let Sm be the total number of multiple photon events. The fourth block contains So 32-bit integers giving the locations of the single photon pixels; the fifth block has Sm integers with the locations of the multiple photon pixels. Finally, the sixth block has Sm 32-bit integers giving the number of photons in each of those multiple photon pixels.

https://github.com/duaneloh/Dragonfly/blob/master/images/dataFormat.png