Minutes20150415 - snoplusuk/echidna GitHub Wiki

echidna meeting - 2015-04-15

Session 1: Introduction and current status of echidna

Brief overview of echidna

Ashley presented gave a walk through of the code on GitHub, highlighting main parts:

Core data structure
Creating spectra from [ntuples] (https://github.com/snoplusuk/echidna/blob/master/echidna/scripts/dump_spectra_ntuple.py) and writing to hdf5s using the store method.
Limit setting:
- Chi-squared calculations
- Limit-setting [algorithm] (https://github.com/snoplusuk/echidna/blob/master/echidna/limit/limit_setting.py#L201)
- "book-keeping" class LimitConfig
- SystAnalyser class for off line analysis
Example limit setting scripts
Documentation
Unittests

Comments/questions

Jack: does the answer you get depend on the order the backgrounds are included?
- TODO (Ashley): check if we get the same limit reversing the order B8 and 2nu2B are added
Jeanne: at the moment have to edit parameters within the code, seems easy to make a mistake
- Should move more towards a config file/database for parameters
Matt: do you ever need to scale up?
- When you first create Spectra instance, supply the number of simulated events that spectrum should represent (i.e. if creating from ntuple(s), how many events were simulated by rat in producing those ntuples? Then when scaling just input the number of simulated events you would like the Spectra to represent now - scaled accordingly. Currently you can input any number of events, including a number larger than that used to create the Spectra. Could add in some warning here?
Matt: complained about documentation being before how to run echidna in README

A few recent updates

See #56, for most relevant updates made by James and Ashley for collaboration meeting and IOP.

TODO (Ashley): check decay.py, looks like older version with g_a, still included

Current status

Open pull requests

#45 - root crashes when using --help option. Ashley was assigned - TODO (Ashley): review PR #45
#51 - non-graphical option for batch farm running. Evelina was assigned - TODO (Evelina): review PR #51
#56 - changes from collaboration meeting and IOP. Evelina was assigned - TODO (Evelina): review PR #56

Current goals/milestones:

James: Josh is keen to look at higher light yields --> echidna can do that through smearing
James: AV position --> have we optimised position of FV correctly --> higher backgrounds from HD ropes
- Follows on from work that James S was doing.
- Jeanne: if you move FV, systematics are not as a function of R

Solar signal fitting (Stefanie):

Log-likelihood fit in solar region
Could use log-likelihood calculation already in place
More of a fit than limit-setting
TODO (Stefanie): write a simple fitting script with a few backgrounds

Timing information --> current problem --> need to come up with a solution

James: just applying different weights --> need to separate out from ntuple code
Evelina: will file size be larger with timing weights applied?
- Would be slightly larger but not noticeable, not main motivation for change

Energy resolution fitting:

Evelina: lots of loops, want to fit each background separately with other backgrounds Bi, Po210 and 2nu --> internal lines
- Pile-up can effect 2nu shape --> could develop one technique for both analyses
- Other energy systematics, not just smearing.

Session 2: SNO+ sig-ex and other analyses --> where can echidna help?

SNO+ sig-ex: current thoughts

Jack's report:

Understanding what people typically do, task has really been to work out what already exists
Probably looking more at likelihood
Just playing around with toy models for the moment
- e.g. only B8 background with a signal --> 2D likelihood space
Check Andy's document that explains where these Likelihood formulae comes from docdb-2266
Andy's code looks at likelihood space to find most likely parameters for PDF --> doesn't look at parameter space around minima --> possible improvement
Josh suggested first step to fit in both energy and PSD
Correlated systematics --> some sort of interface where you could mix two highly correlated parameters
- e.g. if energy and radius were correlated you could form some new energy-radius parameter

Path ahead for sig-ex and echidna - how can echidna help?

A few things to think about and check in echidna to makes sure it is suitable for potentially including Likelihood fit.

Jack: how does timing scale as you add in more backgrounds?
- roughly exponential, need to investigate further
Jeanne: parallelisation and optimisation
James: currently limit-setting tries to do too much --> should compartmentalise more
Various different options to try
Could experiment looking at other ROIs, sidebands
Could also look at what happens if you swap order of 2nu and B8 smearing in x,y,z
Matt: Side-band fit outside FV ~4m
Matt: have you looked at low-background, zero bin effects

Goals for echidna

Jeanne: two main types of goal
- Thesis goals: Ashley and James' theses analyses are quite well defined
- SNO+ goals: how can we use echidna to make a competitive analysis software --> how does this fit in with Jack's goals?
- UK perspective --> good to have software that new students can quickly get involved in
- Also want main SNO+ analysis --> at least two rigorous analysis frameworks to cross-check each other
  - Robustness to biases: not sensitive to binning choices, order of parameters etc.

Next meeting: mid to end May

Session 3: Key issues for next echidna release

Timing model

Jeanne's suggestion --> remove timing from histogram and just use and analytic function for appropriate timing model
Multiply by analytic function each time you return number of events
TODO (Evelina) - assigned to implement these analytic functions #28

Analysis framework

Jeanne: IRODs server
TODO (Jeanne): email Francesca/Alex about setting up iRODs for SNO+ #57

Optimisation:

First goal should be benchmarking with higher dimensionality to see what we need to aim for
TODO (James): remove recursive file calling for backgrounds you don't want to float.
TODO (Ashley and James): benchmarking --> do we actually need to optimise?
Then we can look at different lines of optimisation
- Improving actual algorithms --> more sophisticated than grid search
- Parallelisation
Re-visit next time

Data structure

List of possible parameters:
- L^{Cosmo}
- R --> should bin in (R/R_{AV})^3, also do we want to store (x,y,z)
- Alphaness, alpha PID
- Directionality
Change to a paradigm where we dynamically assign the variables we want to store on reading from the ntuple
- Provide a config when reading from ntuples
- Store config file along with spectra in hdf5
Don't want to change DS once saved to hdf5
TODO (DR MATT MOTTRAM): assigned to start looking into this #58

AOB

Ice cream!!!

Action items summary

TODO (Ashley): check if we get the same limit reversing the order B8 and 2nu2B are added

TODO (Ashley): check decay.py, looks like older version with g_a, still included

TODO (Ashley): review PR #45

TODO (Evelina): review PR #51

TODO (Evelina): review PR #56

TODO (Stefanie): write a simple fitting script with a few backgrounds

TODO (Evelina) - assigned to implement these analytic functions #28

TODO (Jeanne): email Francesca/Alex about setting up iRODs for SNO+ #57

TODO (James): remove recursive file calling for backgrounds you don't want to float.

TODO (Ashley and James): benchmarking --> do we actually need to optimise?

TODO (DR MATT MOTTRAM): assigned to start looking into this #58