Phenotyping Script - KoellenspergerLab/MeXpose GitHub Wiki

Phenotyping Script

Introduction

This script is a comprehensive tool designed to facilitate the analysis of single-cell data, particularly in the context of multiplexed imaging. It performs multiple tasks such as data normalization, outlier removal, clustering, and various types of data visualization including histograms and heatmaps. The script is designed to be scalable, capable of handling large datasets efficiently.

Introduction
Getting Started
Command-Line Flags
Demo Commands

Getting Started

To run the script, enter the scripts absolute location and execute the following command:

Linux/Mac

python path/to/analysis_script.py --folder path/to/folder

Replace path/to/analysis_script.py with the path of the script, and path/to/folder/ with the directory containing your .csv files.

Windows

python analysis_script.py --folder X:\path\to\folder

Replace X:\path\to\analysis_script.py with the path of the script, and X:\path\to\folder with the directory containing your .csv files.

Important Notes

The folder specified should contain all the .csv files to be analyzed.
If a setup.csv file is provided, it must be correctly formatted to specify how each channel should be processed.

File Naming Scheme

Proper file naming is essential for the script to correctly identify and process data files, overlay images, and segmentation masks. Below are guidelines for each type of file:

Data Files

Data files should be in .csv format.
The file name should not contain any spaces or special characters.

Example: Sample_Data_1.csv, Experiment2.csv

Overlay Images

Overlay images should be in a format compatible with OpenCV (e.g., .png, .jpg, .tiff).
The file name should start with the same name as its corresponding data file, followed by a specific identifier for overlay images.

Example: For a data file named Sample_Data_1.csv, the corresponding overlay image could be named Sample_Data_1_overlay.png.

Segmentation Masks

Segmentation masks should be in .png format, compatible with OpenCV.
The file name should start with the same name as its corresponding data file, followed by a specific identifier for segmentation masks.

Example: For a data file named Sample_Data_1.csv, the corresponding segmentation mask could be named Sample_Data_1_mask.png.

By following these naming conventions, you help the script to automatically associate data files with their corresponding overlay images and segmentation masks, thereby streamlining the entire analysis process.

The `setup.csv` File

Overview

The setup.csv file is a configuration file that allows you to specify how each channel should be processed. This is particularly useful if you have multiple channels and want to apply different operations to them.

File Structure

channels: The name of the channel.
normalize: Numeric value indicating whether to normalize this channel.
isArea: Numeric value indicating whether this channel represents the volumetric area measure.
filter: Numeric value indicating whether to filter this channel for outliers.
histogram: Numeric value indicating whether to generate histograms for this channel.
heatmap: Numeric value indicating whether to generate heatmaps for this channel.
cluster: Numeric value indicating whether to include this channel in clustering.
cluster heatmap: Numeric value indicating whether to generate cluster heatmaps for this channel.

For Numeric values, use 1 for True and 0 for False. This applies to all columns besides channels.

Example

channels	normalize	isArea	filter	histogram	heatmap	cluster	cluster heatmap
Channel1	1	0	1	1	1	1	1
Channel2	1	0	0	0	0	1	0
Area_Channel	0	1	0	1	0	0	0

In this example:

Channel1 will be normalized, filtered for outliers, used for clustering, and used for histogram, heatmap, and cluster heatmap visualizations. It is not considered the Area channel.
Channel2 will be normalized but not filtered for outliers. It will be used for clustering but no histogram or heatmap will be generated.
Area_Channel will not be normalized or filtered for outliers. It is considered the Area channel and a histogram will be created for it, but it will not be used for clustering and heatmap visualizations. It will be used for the cluster heatmap visialization.

Make sure to save this file in the same directory as your .csv data files or specify its path using the --setup_csv flag.

How to Use

To use the setup.csv file, place it in the directory specified by the --folder flag, or specify its path directly using the --setup_csv flag:

Linux/Mac

# Example usage with --setup_csv flag
python script_name.py --folder path/to/csv/files --setup_csv path/to/setup.csv

Windows

# Example usage with --setup_csv flag
python script_name.py --folder X:\path\to\csv\files --setup_csv X:\path\to\setup.csv

Command-Line Flags

Each flag serves a specific purpose and allows you to customize the behavior of the script. Below is a detailed explanation of each.

`--folder` or `-f`

Purpose: Specifies the directory containing the .csv files to be analyzed.
Example: --folder path/to/csv/files (Linux) or --folder path\to\csv\files (Windows)
Default: None

`--working_directory` or `-wd`

Purpose: Sets the directory where output files will be saved.
Example: --working_directory path/to/output (Linux) or --folder path\to\output (Windows)
Default: The folder containing the input files.

`--setup_csv`

Purpose: Path to the setup.csv file, which controls how each channel is processed.
Example: --setup_csv path/to/setup.csv (Linux) or --folder path\to\setup. (Windows)
Default: None

`--pixel_size`

Purpose: Sets the pixel size for normalization.
Example: --pixel_size 1.0
Default: None (Normalization will be skipped)

`--outlier_filtering_method`

Purpose: Specifies the method for outlier filtering.
Options: percentiles, zscore
Example: --outlier_filtering_method percentiles
Default: None (Outlier filtering will be skipped)

`--n_std`

Purpose: Sets the number of standard deviations for Z-score filtering.
Example: --n_std 3
Default: 3
Note: Applicable only if --outlier_filtering_method is set to 'zscore'.

`--percentiles`

Purpose: Sets the lower and upper percentile values for percentile-based filtering.
Example: --percentiles 0.0,0.997
Default: 0.0,0.997
Note: Applicable only if --outlier_filtering_method is set to 'percentiles'.

`--scaling_method`

Purpose: Specifies the scaling method for used to scale data for clustering.
Options: robust, minmax
Example: --scaling_method robust
Default: robust

`--clustering_parameters`

Purpose: Sets the parameters used for Phenograph clustering.
Options: k, resolution_parameter, seed
Example: --clustering_parameters
Default: 30,1.0,42

`--no_cluster`

Purpose: Disables clustering when set.
Example: --no_cluster
Default: False (Clustering enabled)
Note: If set, cluster heatmaps will not be generated.

`--umap_parameters`

Purpose: Sets the parameters used for UMAP dimensionality reduction.
Options: n_neighbours, min_dist, random_state
Example: --umap_parameters 15,0.1,42
Default: 15,0.1,42

`--no_umap`

Purpose: Disables UMAP dimensionality reduction when set.
Example: --no_umap
Default: False (UMAP enabled)

`--aggregation_method`

Purpose: Specifies the method used to aggregate data for each cluster in the cluster heatmaps.
Options: median, mean
Example: --aggregation_method median
Default: median

`--save_histograms`

Purpose: Enables the saving of histograms for the specified channels.
Example: --save_histograms
Default: False

`--save_processed_histograms`

Purpose: Enables the saving of histograms after data preprocessing.
Example: --save_processed_histograms
Default: False

`--save_processed_csv`

Purpose: Enables the saving of processed DataFrames as CSV files.
Example: --save_processed_csv
Default: False

`--save_individual_clusters`

Purpose: Enables the saving of individual CSV files for each cluster.
Example: --save_individual_clusters
Default: False

`--save_combined_clusters`

Purpose: Enables the saving a combined CSV file for all clusters.
Example: --save_combined_clusters
Default: False

Demo Commands

For quick testing or demonstration purposes, here's how you can run the script to generate the following output:

size normalised data
removing the top 0.3% of cells
raw & processed histograms
heatmap overlays
processed CSVs

Note: You will need a setup.csv file for these commands.

Linux/Mac

python script_name.py \
  --folder /path/to/csv/files \
  --working_directory /path/to/output \
  --setup_csv /path/to/setup.csv \
  --pixel_size 1.0 \
  --outlier_filtering_method percentiles \
  --percentiles 0.0,0.997 \
  --no_cluster \
  --no_umap \
  --save_histograms \
  --save_processed_histograms \
  --save_processed_csv

Windows

python script_name.py ^
  --folder C:\path\to\csv\files ^
  --working_directory C:\path\to\output ^
  --setup_csv C:\path\to\setup.csv ^
  --pixel_size 1.0 ^
  --outlier_filtering_method percentiles ^
  --percentiles 0.0,0.997 ^
  --no_cluster ^
  --no_umap ^
  --save_histograms ^
  --save_processed_histograms ^
  --save_processed_csv

Replace placeholders like path/to/csv/files or C:\path\to\csv\files with your actual paths and adjust the values as needed.

Phenotyping Script - KoellenspergerLab/MeXpose GitHub Wiki

Phenotyping Script

Introduction

Getting Started

Linux/Mac

Windows

Important Notes

File Naming Scheme

Data Files

Overlay Images

Segmentation Masks

The setup.csv File

Overview

File Structure

Example

How to Use

Linux/Mac

Windows

Command-Line Flags

--folder or -f

--working_directory or -wd

--setup_csv

--pixel_size

--outlier_filtering_method

--n_std

--percentiles

--scaling_method

--clustering_parameters

--no_cluster

--umap_parameters

--no_umap

--aggregation_method

--save_histograms

--save_processed_histograms

--save_processed_csv

--save_individual_clusters

--save_combined_clusters

Demo Commands

Linux/Mac

Windows

The `setup.csv` File

`--folder` or `-f`

`--working_directory` or `-wd`

`--setup_csv`

`--pixel_size`

`--outlier_filtering_method`

`--n_std`

`--percentiles`

`--scaling_method`

`--clustering_parameters`

`--no_cluster`

`--umap_parameters`

`--no_umap`

`--aggregation_method`

`--save_histograms`

`--save_processed_histograms`

`--save_processed_csv`

`--save_individual_clusters`

`--save_combined_clusters`