2024‐07‐12 Meeting Notes - TheEvergreenStateCollege/bioinformatics GitHub Wiki

Paul and Taylor

Talking with Ukkonen, it's not generally true that two identical subtrees (connected by a suffix link) will never deviate from each other and can be replaced with a single subtree / reference.

So this is not as straightforward optimization that we were planning for database persistence.

Special case where this is true: if two nodes are suffix linked only share leaves (one-deep) and will only ever have leaves in the future. (Can one of them ever get new grandchildren added and not the other one?)

Self-referential definition: not a useful compression idea.

Example that we should only extend / append string fragments that belong to a "leaf edge".

Example Schema

model Edge {
  id          Int    @id @default(autoincrement())
  parentId    Int
  childId     Int
  edgeString EdgeStringFragment[]
}

model EdgeStringFragment {
  id          Int    @id @default(autoincrement())
  isLeafEdge  Boolean
  stringFragment String
}