Config Output - GalSim-developers/GalSim GitHub Wiki

Output Field Attributes

The output field indicates where to write the output files and what kind of output format they should be. All output types use the following attributes to specify the location and number of output files, or aspects of how to build and write the output files.

  • file_name = str_value (default = '<config file root name>.fits') You would typically want to specify this explicitly, but if you do not, then if the configuration file is called my_test.yaml, the output file would be my_test.fits.
  • dir = str_value (default = '.') In which directory should the output file be put.
  • nfiles = int_value (default = 1) How many files to build. Note: if nfiles > 1, then file_name and/or dir should not be a simple string. Rather it should be some generated string that provides a different save location for each file. See the section below on setting str_value.
  • nproc = int_value (default = 1) Specify the number of processors to use when building files. If nproc <= 0, then this means to try to automatically figure out the number of cpus and use that. If you are doing many files, it is more efficient to split up the processes at this level rather than when drawing the postage stamps (which is what image.nproc means).
  • skip = bool_value (default = False) Specify files to skip. This would normally be an evaluated boolean rather than simply True or False of course. e.g. To only do the fifth file, you could use skip : { type : Eval, str : 'ffile_num != 4' }, which may be useful during debugging if you are trying to diagnose a problem in one particular file.
  • noclobber = bool_value (default = False) Specify whether to skip building files that already exist. This may be useful if you are running close to the memory limit on your machine with multiprocessing. e.g. You could use nproc > 1 for a first run using multiprocessing, and then run again with nproc = 1 and noclobber = True to clean up any files that failed from insufficient memory during the multiprocessing run.
  • retry_io = int_value (default = 0) How many times to retry the write command if there is any kind of failure. Some systems have trouble with multiple concurrent writes to disk, so if you are doing a big parallel job, this can be helpful. If this is > 0, then after an IOError exception on the write command, the code will wait an increasing number of seconds (starting with 1 for the first failure), and then try again up to this many times.

Output Types

The default output type is 'Fits', which means to write a FITS file with the constructed image in the first HDU. But other types are possible, which are specified as usual with a type field. Other types may define additional allowed and/or required fields. The output types defined by GalSim are:

  • 'Fits' A simple fits file. This is the default if type is not given.
  • 'MultiFits' A multi-extension fits file.
    • nimages = int_value (default if using an input catalog and the image type is 'Single' is the number of entries in the input catalog; otherwise required) The number of hdu extensions on which to draw an image.
  • 'DataCube' A fits data cube.
    • nimages = int_value (default if using an input catalog and the image type is 'Single' is the number of entries in the input catalog; otherwise required) The number of images in the data cube (i.e. the third dimension of the cube).

Custom Output Types

To define your own output type, you will need to write an importable Python module (typically a file in the current directory where you are running galsim, but it could also be something you have installed in your Python distro) with a class that will be used to build the output file.

The class should be a subclass of galsim.config.OutputBuilder, which is the class used for the default 'Fits' type. There are a number of class methods, and you only need to override the ones for which you want different behavior than that of the 'Fits' type.

class CustomOutputBuilder(galsim.config.OutputBuilder):

    def getFilename(self, config, base, logger):
        """Get the file_name for the current file being worked on.

        Note that the base class defines a default extension = '.fits'.
        This can be overridden by subclasses by changing the default_ext property.

        @param config           The configuration dict for the output type.
        @param base             The base configuration dict.
        @param logger           If given, a logger object to log progress.

        @returns the filename to build.
        """
        # ... Determine the full file name typically from the file_name and dir parameters.
        return file_name

    def buildImages(self, config, base, file_num, image_num, obj_num, ignore, logger):
        """Build the images for output.

        In the base class, this function just calls BuildImage to build the single image to
        put in the output file.  So the returned list only has one item.

        @param config           The configuration dict for the output field.
        @param base             The base configuration dict.
        @param file_num         The current file_num.
        @param image_num        The current image_num.
        @param obj_num          The current obj_num.
        @param ignore           A list of parameters that are allowed to be in config that we can
                                ignore here.  i.e. it won't be an error if they are present.
        @param logger           If given, a logger object to log progress.

        @returns a list of the images built
        """
        # ... Build whatever main image or images need to go to the file.

        # To build a single image, you would typically call
        image = galsim.config.BuildImage(base, image_num, obj_num, logger=logger)
        return [image]

        # Or to build many images at once (possibly in parallel), you could call
        images = galsim.config.BuildImages(nimages, base, image_num, obj_num, logger=logger)
        return images

    def getNFiles(self, config, base):
        """Returns the number of files to be built.

        In the base class, this is just output.nfiles.

        @param config           The configuration dict for the output field.
        @param base             The base configuration dict.

        @returns the number of files to build.
        """
        # ... Determine how many files will be built as part of this output file.
        return n_files

    def getNImages(self, config, base, file_num):
        """Returns the number of images to be built for a given file_num.

        In the base class, we only build a single image, so it returns 1.

        @param config           The configuration dict for the output field.
        @param base             The base configuration dict.
        @param file_num         The current file number.

        @returns the number of images to build.
        """
        # ... Determine how many images will be built as part of this output file.
        return n_images


    def getNObjPerImage(self, config, base, file_num, image_num):
        """
        Get the number of objects that will be made for each image built as part of the file
        file_num, which starts at image number image_num, based on the information in the config
        dict.

        @param config           The configuration dict.
        @param base             The base configuration dict.
        @param file_num         The current file number.
        @param image_num        The current image number (the first one for this file).

        @returns a list of the number of objects in each image [ nobj0, nobj1, nobj2, ... ]
        """
        # Normally, this can be figured out by calling ImageBuilder.getNObj for each image.
        # But if you need to do something special, you can.
        nimages = self.getNImages(config, base, file_num)
        nobj = [ ... some calculation of nobj for each image ... for j in range(nimages) ]
        return nobj

    def canAddHdus(self):
        """Returns whether it is permissible to add extra HDUs to the end of the data list.

        In the base class, this returns True.
        """
        # Presumably, you would only bother to override this if you want to return False instead.
        return True

    def addExtraOutputHDUs(self, config, data, logger):
        """If appropriate, add any extra output items that go into HDUs to the data list.

        @param config           The configuration dict for the output field.
        @param data             The data to write.  Usually a list of images.
        @param logger           If given, a logger object to log progress.

        @returns data (possibly updated with additional items)
        """
        # Here is the default implementation, which is normally fine.
        # But the option exists to override this if you want.
        if self.canAddHdus():
            data = galsim.config.AddExtraOutputHDUs(config, data, logger)
        else:
            galsim.config.CheckNoExtraOutputHDUs(config, config['output']['type'], logger)
        return data

    def writeFile(self, data, file_name):
        """Write the data to a file.

        @param data             The data to write.  Usually a list of images returned by
                                buildImages, but possibly with extra HDUs tacked onto the end
                                from the extra output items.
        @param file_name        The file_name to write to.
        """
        # ... Do whatever is necessary to write the file.
        # The base class implementation for reference:
        galsim.fits.writeMulti(data,file_name)

    def writeExtraOutputs(self, config, data, logger):
        """If appropriate, write any extra output items that write their own files.

        @param config           The configuration dict for the output field.
        @param data             The data to write.  Usually a list of images.
        @param logger           If given, a logger object to log progress.
        """
        # If the extra outputs need some special handling, you might need to override this.
        # Again, the base class implementation for reference:
        galsim.config.WriteExtraOutputs(config, data, logger)

The base parameter is the original full configuration dict that is being used for running the simulation. The config parameter is the local portion of the full dict that defines the object being built, which would typically be base['output'].

Then, in the Python module, you need to register this function with some type name, which will be the value of the type attribute that triggers the use of this Builder object.

galsim.config.RegisterOutputType('CustomOutput', CustomOutputBuilder())

Note that we register an instance of the class, not the class itself. This opens up the possibility of having multiple output types use the same class instantiated with different initialization parameters. This is not used by the GalSim output types, but there may be use cases where it would be useful for custom output types.

Finally, to use this custom type in your config file, you need to tell the config parser the name of the module to load at the start of processing. e.g. if this function is defined in the file my_custom_output.py, then you would use the following top-level modules field in the config file:

modules:
    - my_custom_output

This modules field is a list, so it can contain more than one module to load if you want. Then before processing anything, the code will execute the command import my_custom_output, which will read your file and execute the registration command to add the buidler to the list of valid output types.

Extra Outputs

In addition to the fields for defining the main output file(s), there may also be fields specifying optional "extra" outputs. Either extra files to be written, or sometimes extra HDUs to be added to the main FITS files. These extra output fields are dicts that may have a number of parameters defining how they should be built or where they should be written.

  • psf will output (typically) noiseless images of the PSF used for each galaxy.
    • file_name = str_value (either file_name or hdu is required) Write the psf image to a different file (in the same directory as the main image).
    • hdu = int_value (either file_name or hdu is required) Write the psf image to another hdu in the main file. (This option is only possible if type == 'Fits') Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.
    • dir = str_value (default = output.dir if that is provided, else '.') (Only relevant if file_name is provided.)
    • draw_method = str_value (default = 'auto') The same options are available as for the image.draw_method item, but now applying to the rendering of the psf images.
    • shift = pos_value (optional) A shift to apply to the PSF object. Special: if this is 'galaxy' then apply the same shift as was applied to the galaxy.
    • offset = pos_value (optional) An offset to apply when drawing the PSF object. Special: if this is 'galaxy' then apply the same offset as was applied when drawing the galaxy.
    • signal_to_noise = float_value (optional) If provided, noise will be added at the same level as the main image, and the flux will be rescaled to result in the provided signal-to-noise. The default is to use flux=1 and not add any noise.
  • weight will output the weight image (an inverse variance map of the noise properties).
    • file_name = str_value (either file_name or hdu is required) Write the weight image to a different file (in the same directory as the main image).
    • hdu = int_value (either file_name or hdu is required) Write the weight image to another hdu in the main file. (This option is only possible if type == 'Fits') Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.
    • dir = str_value (default = output.dir if that is provided, else '.') (Only relevant if file_name is provided.)
    • include_obj_var = bool_value (default = False) Normally, the object variance is not included as a component for the inverse variance map. If you would rather include it, set this to True.
  • badpix will output the bad-pixel mask image. This will be relevant when we eventually add the ability to add defects to the images. For now the bad-pixel mask will be all 0s.
    • file_name = str_value (either file_name or hdu is required) Write the bad pixel mask image to a different file (in the same directory as the main image).
    • hdu = int_value (either file_name or hdu is required) Write the bad pixel mask image to another hdu in the main file. (This option is only possible if type == 'Fits') Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.
    • dir = str_value (default = output.dir if that is provided, else '.') (Only relevant if file_name is provided.)
  • truth will output a truth catalog. Note: assuming you are using the galsim executable to process the config file, the config dict is really read in as an OrderedDict, so the columns in the output catalog will be in the same order as in the YAML file. If you are doing this manually and just use a regular Python dict for config, then the output columns will be in some arbitrary order.
    • file_name = str_value (either file_name or hdu is required) Write the bad pixel mask image to a different file (in the same directory as the main image).
    • hdu = int_value (either file_name or hdu is required) Write the bad pixel mask image to another hdu in the main file. (This option is only possible if type == 'Fits') Note: 0 means the primary HDU, the first extension is 1. The main image is always written in hdu 0.
    • dir = str_value (default = output.dir if that is provided, else '.') (Only relevant if file_name is provided.)
    • columns = dict (required) A dict connecting the names of the output columns to the values that should be output. The values can be specified in a few different ways:
      • A string indicating what current value in the config dict to use. e.g. 'gal.shear.g1' would grab the value of config['gal']['shear']['g1'] that was used for the current object.
      • A dict that should be evaluated in the usual way values are evaluated in the config processing. Caveat: Since we do not have a way to indicate what type the return value should be, this functionality is mostly limited to 'Eval' and 'Current' types, which is normally fine, since it would mostly be useful for just doing some extra processing to some current value.
      • An implicit Eval string starting with '$', typically using '@' values to get Current values. e.g. to output e1-style shapes for a Shear object that was built with (g1,g2), you could write '$(@gal.ellip).e1' and '$(@gal.ellip).e2'.
      • A straight value. Not usually very useful, but allowed. e.g. You might want your truth catalogs to have a consistent format, but some simulations may not define a particular value. You could just output -999 (or anything) for that column in those cases.

Adding your own Extra Output Type

You can also add your own extra output type in a similar fashion as the other custom types that you can define. (cf. e.g. Custom Output Types) As usual, you would write a custom module that can be imported, which should contain a class for building and writing the extra output, register it with GalSim, and add the module to the modules field.

The class should be a subclass of galsim.config.ExtraOutputBuilder. You may override any of the following methods.

class CustomExtraOutputBuilder(galsim.config.ExtraOutputBuilder):
    def initialize(self, data, scratch, config, base, logger):
        """Do any initial setup for this builder at the start of a new output file.

        The base class implementation saves two work space items into self.data and self.scratch
        that can be used to safely communicate across multiple processes.

        @param data         An empty list of length nimages to use as work space.
        @param scratch      An empty dict that can be used as work space.
        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """ 
        # Probably will want to start by using the base class function:
        super(CustomExtraOutputBuilder,self).initialize(data,scratch,config,base,logger)
        # ...  Do anything else required for initialization.

    def setupImage(self, config, base, logger):
        """Perform any necessary setup at the start of an image.

        This function will be called at the start of each image to allow for any setup that
        needs to happen at this point in the processing.

        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """
        # ... Do any setup that is required at the start of an image.

    def processStamp(self, obj_num, config, base, logger):
        """Perform any necessary processing at the end of each stamp construction.

        This function will be called after each stamp is built, but before the noise is added,
        so the existing stamp image has the true surface brightness profile (unless photon shooting
        was used, in which case there will necessarily be noise from that process).

        Remember, these stamps may be processed out of order.  Saving data to the scratch dict
        is safe, even if multiprocessing is being used.

        @param obj_num      The object number
        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """
        # ... Record any information you need at the end of each stamp processing.
        # Typically you would save things into self.scratch[obj_num]

    def processSkippedStamp(self, obj_num, config, base, logger):
        """Perform any necessary processing for stamps that were skipped in the normal processing.

        This function will be called for stamps that are not built because they were skipped
        for some reason.  Normally, you would not want to do anything for the extra outputs in
        these cases, but in case some module needs to do something in these cases as well, this
        method can be overridden.

        @param obj_num      The object number
        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """
        # Do whatever special processing is needed for skipped stamps.

    def processImage(self, index, obj_nums, config, base, logger):
        """Perform any necessary processing at the end of each image construction.

        This function will be called after each full image is built.

        Remember, these images may be processed out of order.  But if using the default
        constructor, the data list is already set to be the correct size, so it is safe to
        access self.data[k], where k = base['image_num'] - base['start_image_num'] is the
        appropriate index to use for this image.

        @param index        The index in self.data to use for this image.  This isn't the image_num
                            (which can be accessed at base['image_num'] if needed), but rather
                            an index that starts at 0 for the first image being worked on and
                            goes up to nimages-1.
        @param obj_nums     The object numbers that were used for this image.
        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """
        # ... Do whatever processing is required at the end of each image.
        # Typically you would write data into self.data[index]

    def finalize(self, config, base, main_data, logger):
        """Perform any final processing at the end of all the image processing.

        This function will be called after all images have been built.

        It returns some sort of final version of the object.  In the base class, it just returns
        self.data, but depending on the meaning of the output object, something else might be
        more appropriate.

        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param main_data    The main file data in case it is needed.
        @param logger       If given, a logger object to log progress. [default: None]

        @returns the final version of the object.
        """
        # ... Build the final output object using the stored values in self.scratch and self.data.
        # The default implementation just returns self.data.
        return final

    def writeFile(self, file_name, config, base, logger):
        """Write this output object to a file.

        The base class implementation is appropriate for the cas that the result of finalize
        is a list of images to be written to a FITS file.

        @param file_name    The file to write to.
        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]
        """
        # ... Write the output object to a file.
        # The base class implementation for reference:
        galsim.fits.writeMulti(self.final_data, file_name)

    def writeHdu(self, config, base, logger):
        """Write the data to a FITS HDU with the data for this output object.

        The base class implementation is appropriate for the cas that the result of finalize
        is a list of images of length 1 to be written to a FITS file.

        @param config       The configuration field for this output object.
        @param base         The base configuration dict.
        @param logger       If given, a logger object to log progress. [default: None]

        @returns an HDU with the output data.
        """
        # ... Write the output object to an HDU to be appended to the end of the main output.
        # The default implementation checks in self.data has only a single item in it and
        # if so returns that.  If len(self.data) != 1, it raises an exception.
        return hdu  # or this may be an image rather than an HDU

Then, in the Python module, you need to register this function with some type name, which will be the value of the attribute in the output field that triggers the use of this Builder object.

galsim.config.RegisterExtraOutputType('CustomExtraOutput', CustomExtraOutputBuilder())

Note that we register an instance of the class, not the class itself. This opens up the possibility of having multiple output types use the same class instantiated with different initialization parameters. This is not used by the GalSim output types, but there may be use cases where it would be useful for custom output types.

Finally, to use this custom type in your config file, you need to tell the config parser the name of the module to load at the start of processing. e.g. if this function is defined in the file my_custom_output.py, then you would use the following top-level modules field in the config file:

modules:
    - my_custom_output

This modules field is a list, so it can contain more than one module to load if you want. Then before processing anything, the code will execute the command import my_custom_output, which will read your file and execute the registration command to add the builder to the list of valid output types.

Then you can use this as a valid output type:

output:
    type: CustomOutput
    ...

For an example of a custom output type, see meds.pyl in the galsim.des module, which is used by meds.yaml in the des examples directory. It may also be helpful to look at the GalSim implementation of the MultiFits and DataCube types.