Possible research questions - NicMcPhee/MICS-2014-GraphDB-EC GitHub Wiki
Nic: I started brainstorming some questions that I thought I'd share both for feedback and to see if they spark any additional ideas. When I talk about the "root-parent line" I'm talking about the line (and it is a line as opposed to a tree or graph) from an individual back through time following just root parent, reproduction, and elitism. When I talk about the winning or successful line, I'm probably talking about the root-parent line back from the (or a) best (minimal fitness) individual in the final generation.
These are pretty terse notes, so feel free to be confused (& ask questions) about any of them :-)
Find out how many individuals in the initial generation have any root parent descendants in the final generation. I thought this would be an easy query, but I can't seem to get it to work for resource reasons. I think we need to do some optimization/tuning.
Find the average generation of ancestors created with crossover versus the average generation of ancestors created via something other than XO. (Is XO more successful earlier in the run along the winning line?)
Plot fitness over time along the root parent line. Ditto for the non-root parent line. (I'd expect fitness to be generally increasing on the root parent line, but maybe less so on the non-root parent line?) This is an easy query for a single end-of-run individual. It would be nice to have the lines for every individual in the final generation and overlay them to see how similar they are. I'm not sure how to easily do that in Neo4J/Cypher, however.
Plot size over time along the root parent line. Ditto for the non-root parent line. (Not at all sure what happens here.)
Plot average generation of mutations and average generation of successful mutations, i.e., mutations on a winning line.
Compute correlations between fitness of individuals and root versus non-root parents and maybe grandparents. Maybe something similar for siblings and cousins.
If do this might want to compare mutation as well. Average crossover point along the successful line. What I think what we really care about is the level of the XO point, but that will require some computation to figure out.
Correlation between crossover point and fitness improvement. Again, what we probably want is the level.
Use floating point constants as genetic markers. Where did a given constant come from? When was it introduced to the winning line?
Compare winning lineage to worst lineage. Where did things go wrong? Look at most recent common ancestor where ancestry split and what changes to make it the worst.