Using spaxelsleuth with other data - hzovaro/spaxelsleuth GitHub Wiki
spaxelsleuth is designed to be modular such that it is straightforward to implement support for processing data from new sources.
spaxelsleuth.io.io is the module containing generic functions for creating and loading both the "metadata" and the "full" spaxelsleuth DataFrames. It contains the following functions:
make_df(mysurvey) creates and writes the full spaxelsleuth DataFrame for data from source mysurvey.
- The "metadata" DataFrame corresponding to
mysurveyis loaded if such a function exists (seemake_metadata_df()below). - Under the hood, it calls
mysurvey.process_galaxies()for each galaxy inmysurvey(and runs these across multiple cores to reduce computation time), and consolidates the output into a single DataFrame in which each row represents a pixel or spatial bin and each column represents an associated quantity, e.g.v_gas. See theprocess_galaxies()section below for the required inputs and outputs of this function. - If a metadata DataFrame is present, it is merged with the full DataFrame on galaxy ID.
-
utils.add_columns()is then called, which calculates additional data products, e.g. SFRs, extinctions, metallicities, etc. based on the columns returned bymysurvey.process_galaxies()and adds them to the DataFrame. Data quality and S/N cuts are optionally applied in this step via flags that can be set by the user. - The DataFrame is then saved to file in the output directory (specified in the configuration file as
settings["mysurvey"]["output_path"]). The filename can be specified by the user as an input argument; otherwise, it follows the formula
mysurvey_<bin_type>_<ncomponents>-comp_<extcorr>_minSNR=<eline_SNR_min>_minANR=<eline_ANR_min>_<debug>_<df_fname_tag>.hd5.
load_df(mysurvey, ...) loads DataFrames created using make_df(mysurvey, ...). See the docstring for detailed information about options and parameters that can be passed.
make_metadata_df(mysurvey, **kwargs) calls mysurvey.make_metadata_df(**kwargs).
load_metadata_df(mysurvey) calls mysurvey.load_metadata_df().
and so on.
In order to implement support for data from a new source (which we will call mysurvey in this example), carry out the following steps:
Add a new entry to your local configuration file called mysurvey with key/value pairs as follows:
"mysurvey": {
"data_cube_path": "mysurvey/input/", // Path to the raw data cubes.
"input_path": "mysurvey/input", // Path to other data products.
"output_path": "mysurvey/output" // Path where spaxelsleuth DataFrames will be saved.
"flux_units": 1e-16, // log10 flux/flux density units for the input data; e.g., 1e-16 corresponds to units of 10^-16 erg s^-1 cm^-2 (Å^-1).
"ncomponents": [ // Emission line fitting options, e.g. if your data set has available emission line products resulting from 1, 2 and 3-component fits.
1,
2,
3
],
"bin_types": [ // Spatial binning scheme options. "default" refers to unbinned data.
"default"
],
"eline_list": [ // List of emission lines that have been fitted to your data set.
"OII3726",
"OII3729",
"HBETA",
"OIII5007",
"OI6300",
"HALPHA",
"NII6583",
"SII6716",
"SII6731"
],
"sigma_inst_kms": 40, // Instrumental resolution (measured as the sigma of a Gaussian LSF).
},mysurvey should contain the following functions:
-
process_galaxies()(required) -
make_metadata_df()(optional)
args, which is a list that is expanded as follows:
gg, gal, ncomponents, bin_type, df_metadata, kwargs = args where
-
ggis the index of galaxy with IDgalin the list of galaxies processed bymake_df() -
galis the galaxy ID, which must be of typeintrather thanstr -
bin_typeis the binning scheme used (must be a valid entry insettings["mysurvey"]["bin_types"]) -
ncomponentsis the number of kinematic components fitted to the emission lines (must be a valid entry insettings["mysurvey"]["ncomponents"]) -
df_metadatais the metadata DataFrame (if no such DataFrame is available, this parameter will beNone) -
kwargsis a dict of keyword arguments that are passed as extra arguments tomake_df(). For example, say your survey contains data products corresponding to two different ways to fit stellar kinematics. Then you could pass a keyword argument tomake_df(), for instance "stekin_fit_moments=2", and access this from withinprocess_galaxiesaskwargs["stekin_fit_moments"].
Note that you don't have to actually use any of these arguments if you don't want to. For example, if mysurvey only has data in a single binning scheme then the bin_type argument may be unused; similarly, there may be no need for you to access the information in df_metadata.
process_gals() must return a tuple of the form
(rows_arr, colnames)where
-
rows_arris a 2D array such that each row represents measurements from a single spaxel (or spatial bin) and each column represents a different quantity. -
colnamesis a list of strings corresponding to the columns inrows_arr.- These must be in the same order - i.e.,
colnames[0]must correspond to the quantity in columnrows_arr[:, 0]and so on. - The galaxy ID must be stored as a column as "ID".
- Column names must follow the naming conventions in column descriptions page - e.g., total H$alpha$ fluxes must be stored as "HALPHA (total)" and not "Halpha (total)" or "HALPHA (tot)", otherwise other spaxelsleuth functions will not work.
- These must be in the same order - i.e.,
No inputs are required, although you can provide input arguments if you wish. These must be passed to io.make_metadata_df().
The function itself must return nothing, but this function must save a metadata DataFrame to settings["mysurvey"]["output_path"]. Any filename can be used, with the caveat that you must
None.
The metadata DataFrame. It is recommended that you raise a FileNotFoundError exception if the DataFrame cannot be found.
Below is a template you can use to get started. For now, this file must be called mysurvey.py and be saved in the spaxelsleuth/io/ folder.
from pathlib import Path
import pandas as pd
from spaxelsleuth.utils.misc import _2d_map_to_1d_list
# etc.
# Paths
input_path = Path(settings["mysurvey"]["input_path"])
output_path = Path(settings["mysurvey"]["output_path"])
data_cube_path = Path(settings["mysurvey"]["data_cube_path"])
def process_galaxies(args):
# Extract input arguments
gg, gal, ncomponents, bin_type, df_metadata, kwargs = args
# Get the x and y coordinates corresponding to measurements in your data.
# For example, if you want to extract data points only from pixels in a boolean mask:
y_c_list, x_c_list = np.where(mask)
# Calculate stuff and store in a dict, such that the keys are the column names and the values are 2D maps of each quantity.
_2dmap_dict = {}
_2dmap_dict["HALPHA (total)"] = halpha_total_flux_map
_2dmap_dict["HALPHA error (total)"] = halpha_total_flux_map_err
# etc.
# Access keyword arguments passed to io.make_df() as follows:
some_kwarg = kwargs["some_kwarg"]
some_other_kwarg = kwargs["some_other_kwarg"]
# etc.
"""
Tips:
- Check out io.sami, io.hector, io.s7 and io.lzifu for examples of how to process data in different formats.
- You can make use of functions in utils, misc, etc. to automate some calculations. For instance, the D4000Å break strength can be calculated using continuum.compute_d4000().
- Most of the time, the input data format will be in the form of 2D images. Use utils.misc._2d_map_to_1d_list() to convert 2D arrays into 1D arrays.
"""
# Convert 2D maps to 1D rows
rows_list = []
colnames = list(_2dmap_dict.keys())
for colname in colnames:
rows = _2d_map_to_1d_list(_2dmap_dict[colname], x_c_list, y_c_list, nx, ny)
rows_list.append(rows)
# Add galaxy ID
rows_list.append([gal] * len(x_c_list))
colnames.append("ID")
# Transpose so that each row represents a single pixel & each column a measured quantity.
rows_arr = np.array(rows_list).T
logger.info(f"Finished processing galaxy {gal} ({gg})")
return rows_arr, colnames
def make_metadata_df(arg1, arg2):
# Load a .csv file containing galaxy metadata
df_metadata = pd.read_csv("some_metadata_file")
# Do some calculations, e.g. calculate the angular scale
df_metadata["kpc per arcsec"] = df_metadata["D_A (Mpc)"] * 1e3 * np.pi / 180.0 / 3600.0
# Use input arguments
if arg1 == "method 1":
# then compute some quantity using method 1
elif arg1 == "method 2":
# then compute it using method 2, etc.
# Rename some columns to ensure consistency with spaxelsleuth column conventions, e.g. ensure that the galaxy names are stored in the "ID" column
df_metadata = df_metadata.rename(columns={
"galaxy_name": "ID",
})
"""
Tips:
- Check out io.sami, io.hector, io.s7 and io.lzifu for examples of how to calculate various galaxy metadata properties.
"""
# Save to file
df_metadata.to_hdf(os.path.join(output_path / "mysurvey_metadata.hd5")
return
def load_metadata_df():
if not os.path.exists(output_path / "mysurvey_metadata.hd5"):
raise FileNotFoundError(
f"File {output_path / 'mysurvey_metadata.hd5'} not found. Did you remember to run make_metadata_df() first?"
)
df_metadata = pd.read_hdf(output_path / "mysurvey_metadata.hd5")
return df_metadataTo use your newly created submodule for handling mysurvey data, do the following:
from spaxelsleuth import load_user_config
load_user_config("/path/to/custom/config/file/.spaxelsleuthconfig.json")
from spaxelsleuth.io.io import make_df, load_df, make_metadata_df, load_metadata_df
nthreads = 10
ncomponents = 1
eline_SNR_min = 3
eline_ANR_min = 3
correct_extinction = True
# Create the DataFrames
make_metadata_df(survey="mysurvey", arg1="method 1", arg2=10) # Note that you can pass in arguments to mysurvey.make_metadata_df() here
df_metadata = load_metadata_df(survey="mysurvey")
# Create the DataFrame
make_df(survey="mysurvey",
bin_type="default",
ncomponents=ncomponents,
eline_SNR_min=eline_SNR_min,
eline_ANR_min=eline_ANR_min,
correct_extinction=correct_extinction,
metallicity_diagnostics=[
"N2Ha_PP04",
],
nthreads=nthreads,
some_kwarg=999,
some_other_kwarg=False,
)
# Load the DataFrame
df = load_df(
survey="mysurvey",
bin_type="default",
ncomponents=ncomponents,
eline_SNR_min=eline_SNR_min,
eline_ANR_min=eline_ANR_min,
correct_extinction=correct_extinction,
)If you have created a submodule for a particular survey and think it might be useful for others, feel free to create a branch and make a pull request!