Cypher query sheet - NicMcPhee/MICS-2014-GraphDB-EC GitHub Wiki
We should collect together queries that we've found useful (or found didn't work).
Indexing
Creating an index on generations
create index on :Individual(generation);
Creating an index on fitness
Unfortunately this doesn't seem to generate an index, and I'm not sure why. Maybe we can only index on discrete values? That's a bummer, because it would be nice to be able to index on fitnesses.
create index on :Individual(penalizedFitness);
What's really weird is that it looks like we can generate an index on non-penalized fitness, as this works:
create index on :Individual(fitness);
OK, some more digging suggests that it had automagically created an index on fitness. When I run
:schema ls
which lists all the available indices, it lists fitness. So it either created it "for free" when I started up Neo4J, or my attempt at creating it worked and didn't return something useful. Weird.
Winners and their ancestry
Finding the "winning" fitness
This finds the best value of penalizedFitness
over the entire run. You could use fitness
to get the unpenalized fitness.
start n=node(*)
return min(n.penalizedFitness);
Find a winning individual
start n=node(*)
return n
order by n.penalizedFitness
limit 1;
Finding all winning individuals
This finds all the individuals that have the best (minimal) (penalized)fitness.
match (n:Individual) with min(n.penalizedFitness) as mf match (m:Individual {penalizedFitness: mf}) return m;
Find the root ancestry branch from an individual
This finds the root ancestry path from individual 99000 to the initial generation.
start n=node(99000)
match ps = (n)<-[r:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*]-(p)
return distinct ps;
Get the fitnesses on (non)root ancestry branch from an Individual
This gives the sequence of all the fitnesses along the root ancestry branch from individual 99000 to the initial generation. Replacing ROOT_XOOF
with NONROOT_XOOF
does the same, but along the non-root ancestry branch.
start n=node(99000)
match ps = (n)<-[:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*]-(p) where p.generation=1
return extract(x in nodes(ps) | x.fitness);
Find all the copying events in the winning sequence
This finds the root parent path from node 99000 to the starting generation and then isolates (using filter
) the edges that were either elitism or reproduction.
start n=node(99000)
match ps = (n)<-[r:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*]-(p)
return filter(x in relationships(ps) where type(x) = "ELITISM" OR type(x) = "PARENTOF");
Finding maximal paths of copying on winning sequence
This monster finds all the maximal sub-sequences of the root path from node 99000 that consist of entirely elitism or reproduction. The big MATCH
clause finds:
- A path from
n
to an anonymous node (sayx
) consisting of just root parent steps (elitism, reproduction, mutation, and root parent XO) - A single step from
x
top
consisting of either mutation or root parent XO (i.e., not elitism or reproduction). - A path from
p
toq
of just elitism and reproduction – this is the path we're after. - A single step from
q
to some anonymous node (sayy
) consisting of either mutation or root parent XO (i.e., not elitism or reproduction).
The two single steps are there to "bookend" the desired path, ensuring that we have a maximal path of elitism or reproduction.
start n=node(99000)
match (n)<-[:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*]-()<-[:MUTANTOF|ROOT_XOOF]-(p)<-[r:ELITISM|PARENTOF*]-(q)<-[:MUTANTOF|ROOT_XOOF]-()
with p, q, length(r) as l, (p.generation+q.generation)/2.0 as g
order by g
return distinct p.fitness, l, q.generation, p.generation;
This generates a table such as
p.fitness l q.generation p.generation
39.99795368655481 1 3 4
35.25790411796866 1 6 7
39.99795368655481 2 8 10
39.99795368655481 1 12 13
31.849136929437357 1 21 22
19.933053716734456 1 30 31
17.5540300257814 5 34 39
16.795738702936962 2 41 43
15.710449230613747 1 48 49
15.485009361166156 1 50 51
9.811316962843701 2 54 56
8.968491179434844 2 61 63
6.777071487628525 4 65 69
6.795367268011395 1 76 77
6.617312802151567 1 78 79
6.133201810958635 7 80 87
6.160256037512394 4 88 92
6.146584251064052 1 93 94
The third row from the bottom tells us that there is a chain of 8 generations (from 80 to 87) where every individual had fitness 6.133 and that the individual in generation 80 was the ancestor of each of the other 7 via some combination of elitism and reproduction, i.e., those 7 were all exact copies of the generation 80 individual.
My hypothesis is that the the longest sequences will tend to be towards the end of a run, but I haven't really tested that. That's sorta kinda true if you squint in this example, but even that's kinda fuzzy, and I suspect the behavior might be quite different across multiple runs.
Finding the common shared ancestor
This is quite fast and illustrates how we probably want to be using collections.
Finds all ancestors in generation 52 (100-48) of any individual in the final generation.
match (n:Individual {generation: 100})
match ps = (p)-[:ELITISM|PARENTOF|MUTANTOF|ROOT_XOOF*..48]->(n)
with collect(ps) as paths, max(length(ps)) as maxLength
with filter(path in paths where length(path)=maxLength) as longest
return distinct [ pt in longest | nodes(pt)[0] ];
Finds the set of all common ancestor nodes
match (n:Individual {generation: 100})
match ps = (p)-[:ELITISM|PARENTOF|MUTANTOF|ROOT_XOOF*]->(n)
with collect(ps) as paths, max(length(ps)) as maxLength
with filter(path in paths where length(path)=maxLength) as longest
return reduce(s = nodes(longest[0]), pt in longest | filter(nd in nodes(pt) where nd in s));
Finds the most recent common ancestor.
match (n:Individual {generation: 100})
match ps = (p)-[:ELITISM|PARENTOF|MUTANTOF|ROOT_XOOF*..100]->(n)
with collect(ps) as paths, max(length(ps)) as maxLength
with filter(path in paths where length(path)=maxLength) as longest
with reduce(s = nodes(longest[0]), pt in longest | filter(nd in nodes(pt) where nd in s)) as pre
return last(pre);
This is an alternative to the previous query that I think will work (but I haven't tested it yet) and might be slightly more efficient? - Nic
match (n:Individual {generation: 100})
match (p:Individual {generation: 1})
match ps = (p)-[:ELITISM|PARENTOF|MUTANTOF|ROOT_XOOF*]->(n)
with collect(ps) as longest
with reduce(s = nodes(longest[0]), pt in longest | filter(nd in nodes(pt) where nd in s)) as pre
return last(pre);
Impact of root parents vs. non-root parents
How often are root parents more fit than non-root parents?
This finds all the nodes on the root parent path to a winner, and then for each Individual on that chain that is the result of XO, computes the difference between the root parent's fitness and the non-root parent's fitness, returning the generation and this difference.
start n=node(99000)
match (n)<-[:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*]-(p)
match (p)<-[:ROOT_XOOF]-(r)
match (p)<-[:NONROOT_XOOF]-(s)
with p, r, s, s.penalizedFitness - r.penalizedFitness as diff
return p.generation, diff order by diff;
This returns something like:
2 -828.2125358185428
17 -6.442315176347453
24 -4.866373746564939
14 -3.44373890228956
15 -1.1434684157875452
12 -0.6294014981797531
74 -0.18160621232085106
11 -0.09999999999999432
5 0.240000000000002
33 0.42316293097401925
29 1.1976680139671814
76 1.5771595918735866
73 2.0053961671315577
93 2.2963387877962216
47 2.3330829124719656
32 2.367694487543506
78 2.451013321800727
95 2.536112724033611
65 3.0982596926559953
64 3.290127066211806
26 3.7418479997854064
70 4.360405872449269
8 4.560049568586152
46 4.592193210704526
57 5.039008891081306
20 5.163206708353698
52 5.29135735087857
50 5.481737474259397
44 5.714852689575078
45 5.794053970599624
30 5.991421851057115
59 6.179128106675442
75 6.804410617916949
48 7.088313174438644
25 7.135902914044095
40 7.155726723126623
18 7.314053319398269
60 7.327222529409134
53 7.81161863532895
34 8.084012483892337
19 8.308816757117448
58 8.759843646057597
54 10.066608209943368
61 10.30189177816499
16 11.393364815677668
27 14.458555832046073
80 17.898070226432402
21 18.817497028470385
71 23.05633432861173
3 23.109040805748045
88 40.58281084808121
72 70.9177598905392
6 77.99318948920876
41 80.2531035412967
It's clear that it's almost always the case that the root parent was more fit than the non-root parent, and the exceptions tended to be early. (It also turns out that the XO point for those early exception tended to be high in the tree; it's not hard to extend this query to include that info.)
This variant of actually graphs that path, with the XO events included:
start n=node(99000)
match ps = (n)<-[:ELITISM|PARENTOF|ROOT_XOOF|MUTANTOF*..25]-(p)
match (p)<-[:ROOT_XOOF]-(r)
match (p)<-[:NONROOT_XOOF]-(s)
with ps, p, r, s, s.penalizedFitness - r.penalizedFitness as diff
return ps, p, r, s, diff order by diff;
Who has descendants?
Finding initial individuals with root descendants at end
match (n:Individual {generation: 100})
match ps = ()-[:ELITISM|PARENTOF|MUTANTOF|ROOT_XOOF*99]->(n)
with collect(distinct nodes(ps)[0]) as starts
return starts;
This works nicely, even on the 10K runs (finishing in under 2 secs). The 100 on the first line and the 99 on the second line need to be the number of the last generation and one less than that (i.e., the number of edges on a path from the first generation to the last).