Summarizing multiple Wiggle of the same condition (aka. Technical Replicates) - Integrative-Transcriptomics/tss-prediction-comparison GitHub Wiki

Context

Because our app has to handle data-sets from different conditions with multiple technical replicas we need to be able to summarize data for our prediction.
Summarization can take place before (summarizing technical replicas) and after the TSS prediction (summarizing data generated by prediction for different conditions).
Since our wiggle files are parsed to pandas DataFrames before any further processing, the following ideas are based on DataFrames.

Summarizing multiple technical replicas of one condition (before prediction)

median

  • creating new DataFrame D
  • each value of a position x in D is median of values of position x in technical replicas
  • median is resistant against extreme values

Summarizing different conditions (after prediction)

adding up values

  • creating new DataFrame D
  • each value of a position x in D is sum of values of position x in conditions

median

  • creating new DataFrame D
  • each value of a position x in D is median of values of position x in conditions

max-value

  • creating new DataFrame D
  • each value of a position x in D is max of values of position x in conditions