Thomas' rough TODO list - WormBase/db-prototypes GitHub Wiki

Dead code

   src/clj/mongoweb*               ; prototype web widgets using mongodb
   src/clj/pseudoace
            import.clj         ;; NEARLY dead.  "importer" function still used for now
            feature_schema.clj ;; First attempt at a feature schema, use locatable-schema instead
            feature_loader.clj ;; Use locatable-import instead
            import_mongo.clj   ;; Prototype mongodb importer
            smallace_metadata.clj   ;; "Hand-written" metadata for smallace.  Dead.
            server.clj         ;; saceserver emulator, could be revived but needs work
   src/clj/web/
            widgets.clj        ;; Old pure-Clojure web widgets, use "rest" package instead.
            orthologs.clj      ;; Old stand-alone ortholog-query tool (just a demo really)

At some point, it might be worth stripping this out. If so, it's probably also worth looking at and re-organizing the top level web-code routing in web.core.

Importer

The acedb <--> Datomic mapping is essentially complete and seems to be standing up to general use. There are a few things that are worth keeping an eye on once people start curating data, principally relating to AceDB model lines with multiple "variables", e.g.:

              DB_info Database ?Database ^database ?Database_field ^field UNIQUE ?Text ^accession
  • In AceDB, it's not possible to specify a later variable if an earlier one is missing, e.g. can't specify and accession without a field. Datomic can't express this kind of constraint, and currently it's possible to create data which can't sensibly be re-exported to AceDB. Could add constraints in the curation tool if this became an issue.

  • Likewise, there's no way to represent UNIQUE constraints on some, but not, variables. Again, could be enforced in the curation tool if it became a problem

  • Per @Paul-Davis, it would be useful to have an optional mode where warnings are given for tag-without-value situations (which currently get silently ignored).

  • the importer does deal with mal-formed ACE time-stamp dates well. The following EDN file is produced:

    -rw-r--r-- 1 gwilliams opsworks 128 Oct 10 00:02 2009-10-29_17:20.edn.gz

  • instead of the normal file format:

    -rw-r--r-- 1 gwilliams opsworks 293139 Oct 10 00:19 2009-10-29.edn.gz

  • because the ACE timestamp is: "2009-10-29_17:20" with no "_tsuser" part at the end.

Other importer stuff.

  • Possible: Support the REPEAT tag. Only used from ?Grid and ?Position_matrix. I've ignored this and written an import-custom method for ?Position_matrix instead. Could do the same for ?Grid, but I'm not sure how important that class really is. Some temptation just to drop it...

  • Possible: better integration of the ?Feature_data and ?Homol_data importers (see locatable-import.clj). Is this really helpful, or should some/all of this data be moved outside of the core database (bigBeds or similar)?

  • Improve use of Importer TempIDs with an eye to reducing the amount of index-thrashing which occurs during the log-playback phase of full-db imports. In particular, probably avoid using SQUUIDs.

Tree-viewer

The front-end side of this could benefit from a general cleanup -- wasn't really sure where this was going when I started!

Colonnade

  • Currently this isn't suitable for public consumption because it relies on POSTing Datalog to the server and thus offers a vector for arbitrary Datalog injection. Thus, it requires a logged on user even when TRACE_REQUIRE_LOGIN is unset.

  • May be better to shift the query generation server side instead. This wouldn't actually be very hard because it's written in Clojure(script). Support for .cljc files in current tooling (wasn't available when I started) makes this easier.

  • Navigation around components: if you follow a link to the component, it may be better to default to giving the first positional property of the component, rather than the component entity itself. Then do something analogous to "Right from" in tablemaker if you want to access other positional properties or evidence.

Curation forms

  • Forms for Features and Variations

  • More scripts.

  • In .ace patching interface, need to support temporary names for __ALLOCATE__d objects (TD done, see Curation Forms).

  • Better search interface!

Web API code

  • Currently mostly only gene stuff -- extend to other classes!

  • Possibly worth giving a bit more thinking about how the References widget works. I think we should be running most/all of this on top of Datomic rather than delegating to Xapian.

  • For widgets with many large fields, it may be worth populating fields concurrently.

  • Alternatively, possibly switch to using field-granuarity rather than widget-granularity HTTP requests (would buy concurrency "for free").

Nameserver interface - reading in ACE data

Just checking through the REST server access points - they all appear to work except the Nameserver interface

Nameserver interface http://db.wormbase.org:8240/curate/gene/new

The 'Patch DB' option to load in ACE data :

Gene : WBGene00003020 Remark "This is a test remark"

gives:

  java.util.concurrent.ExecutionException
  java.lang.IllegalArgumentException: :db.error/wrong-type-for-attribute Value [:gene/id "WBGene00003020"] :gene/remark This is a test remark is not a valid :uuid for attribute :importer/temp

but the old 'curation.wormbase.org:8130/curate/gene/new' version with the same ACE data works with no errors.

Gary wonders if this is the result of a change introduced in the code when Thomas sorting out the XREFS just before he left, resulting in a difference between the server Gary set up and the one that had been running for a few weeks on the 'curation' machine?

Nameserver reading ACE in with the old code running on the 'curation.wormbase.org' machine

Testing reading ACE data on curation.wormbase.org:8130 worked, but the Remark data looks a little weird (It has

{:db/id 17592213098624, importer/temp "[:gene/id \"WBGene00003020\"] :gene/remark This is a test remark", :gene.remark/text "This is a test remark"}

when you see it in colonnade, compared to normal (pre-existing) Remark text data which looks like:

{:db/id 936783913206741, :gene.remark/text "[120315 mt3 pad] This gene was previously the uncloned version of lin-38"}

Is the ':db/txInstant' set to the current date

When the acedb import is running, the sorted EDN files produced from the ACE files has a datestamp that is used to set ':db/txInstant' when reading in each line of data. Just a thought, but is the ':db/txInstant' ever set to the current date again when the data is all read in, so that new transactions will have the correct data?

General stuff

  • Would be good to have a little admin interface for managing user data!

  • It may be better to move user information into its own DB. Makes rebuilding the main DB easier, and avoids the possibility of people using the curation forms to transact new user data. This should be a very straightforward change. Only issue is it will be necessary to store WBPerson IDs for curators, rather than links to the relevant Person entity

  • Long term: "sandbox" mode allowing transaction data to be previewed/edited further before committing. Potentially possible using datomic.api/with to produce DBs that preview the transaction data (but some bookkeeping needed to make this work well with editing tools).