2024‐07‐12 Meeting Notes - TheEvergreenStateCollege/bioinformatics GitHub Wiki
Paul and Taylor
Talking with Ukkonen, it's not generally true that two identical subtrees (connected by a suffix link) will never deviate from each other and can be replaced with a single subtree / reference.
So this is not as straightforward optimization that we were planning for database persistence.
Special case where this is true: if two nodes are suffix linked only share leaves (one-deep) and will only ever have leaves in the future. (Can one of them ever get new grandchildren added and not the other one?)
Self-referential definition: not a useful compression idea.
Example that we should only extend / append string fragments that belong to a "leaf edge".
Example Schema
model Edge {
id Int @id @default(autoincrement())
parentId Int
childId Int
edgeString EdgeStringFragment[]
}
model EdgeStringFragment {
id Int @id @default(autoincrement())
isLeafEdge Boolean
stringFragment String
}