Useful utility procedures - WormBase/db-prototypes GitHub Wiki

Useful utility procedures

These are a set of useful utility procedures for doing various house-keeping operations with the Datomic databases. They will probably become obvious when we become more adept at using Datomic, but for now they should be useful.

Checking on names in a database

Using a Groovy shell:

    % cd datomic-free-0.9.5206/
    % bin/groovysh
    > import datomic.Peer
    > Peer.getDatabaseNames("datomic:free://localhost:4334/*")
    ===> ("wb248-imp1" "ga3" "ga2" "wb249-imp1")

Using lein repl:

    % bin/repl
    (require '[datomic.api :as d])
    (d/get-database-names "datomic:free://localhost:4334/*")

Dump a database

    # The transactor must be running.
    # Shut down all unused repls and groovy shells using up memory 
       otherwise you may run out of memory.
    # Make the dump directory and set it to be group writeable:
    % sudo mkdir /mnt/data/datomic-dumps/database_name_dump
    % sudo chgrp opsworks /mnt/data/datomic-dumps/database_name_dump
    % sudo chmod g+w /mnt/data/datomic-dumps/database_name_dump
    # Run backup-db using the same amount of memory as you used for the transactor:
    % bin/datomic -Xmx4g -Xms4g backup-db "datomic:free://localhost:4334/name_of_database" "file:/mnt/data/datomic-dumps/database_name_dump"
    # it complains a bit about not being able to write to the
      datomic-free*/log/2015-09-01.log file if you do not have write-permission, but seems to work.
    # The database dump is a directory containing binary files. It is not
      worth trying to gzip these binary files.
    # You can backup the same database at different points in time to a
      single backup URI. It will only move the changed data.
    # If you backup a database repeatedly to the same URI, differential
      backup will substantially reduce backup times. This approach is
      strongly recommended, especially for large databases.

Restore a database from a dump

    # If you wish to restore to a fresh data file, then remove the old
      'data' directory (as pointed to by the transactor config file) and
      restart the transactor:
    #   Kill the transactor (use 'ps -eF | grep datomic.launcher' to find the PID of the transactor)
    #   Delete or Move away the directory holding the storage file: '/mnt/data/datomic-free-0.9.5130/data'
    #   Start the transactor again:
    %   cd /mnt/data/ datomic-free-0.9.5130
    %   export XMX=-Xmx4G
    %   sudo  bin/transactor -Xmx4G -Xms4G config/transactor.properties &
    # The transactor must be running.
    % bin/datomic -Xmx4g -Xms4g restore-db "file:/mnt/data/datomic-dumps/database_name_dump" "datomic:free://localhost:4334/name_of_database"
    # Restore can be used to rename a database.
    # You cannot restore a single database to two different URIs (two
      names) within the same storage. i.e. you cannot make a copy of a
      database with a different name in the same storage file.
    # You must kill and restart peers (repls) and transactors after a restore. 
    # Time ~ 5 mins reading geneace into an existing storage file

Delete a database

If there is only the one database in a storage file, then simply kill off the transactor process, repls etc and delete the storage data file. Then restart the transactor.

To delete a specific database amongst many in the storage file

If you have several databases in one storage file and wish to delete one of them, do this. The data storage file will not get any smaller, but space within it is available for re-use.

    % cd datomic-free-0.9.5206/
    % bin/repl
    (require '[datomic.api :as d])
    (def uri "datomic:free://localhost:4334/name-of-database")
    (d/delete-database uri)

Move a database to another machine

    # Dump the database to a directory structure (See above).
    # Transfer the dump directory structure to the target machine (e.g. the 'acedb' machine).
    % tar cf database_dump.tar database_dump/* # Time: 20 mins for geneace
    # On the target machine:
    % scp -r [email protected]:/mnt/data/datomic-dumps/database_dump.tar $wormbase/tmp/  # Time: 3 mins for geneace
    % tar xf database_dump.tar # Time: 15 mins for geneace
    # Restore the dumped database (see above).
    # Transactor must be running on the 'acedb' machine.
    % bin/datomic -Xmx4g -Xms4g restore-db "file:/nfs/panda/ensemblgenomes/wormbase/tmp/database_dump" "datomic:free://localhost:4334/name_of_database"
    # Time: 15 mins for geneace

Start a transactor

The transactor Must be running for you to do most database operations.

Starting the transactor on 'hopi'

    % screen -S transactor -h 10000				;; start a screen or tmux session for the transactor to run in
    % cd /mnt/data/datomic-free-0.9.5130
    # If the config file doesn't exist
    % sudo cp samples/free-transactor-template.properties  transactor.properties
    % sudo chgrp opsworks transactor.properties
    % sudo chmod g+w transactor.properties
    # Edit the config file to contain the 'Recommended settings for -Xmx4g production usage' values.
    # 4Gb should be OK for the full database. If not, try 6Gb.
    % export XMX=-Xmx4G
    % sudo  bin/transactor -Xmx4G -Xms4G config/transactor.properties &

Starting the transactor on the 'acedb' machine

    % screen -S transactor -h 10000				;; start a screen or tmux session for the transactor to run in
    % become wormbase
    % cd $WORM_PACKAGES/datomic-free
    % setenv XMX '-Xmx4G'
    % bin/transactor -Xmx4G -Xms4G config/samples/free-transactor-template.properties &

Dump out ACE class data from a datomic database

This will dump out specified class to a named file.

    % cd ~/datomic/db/pseudoace
    % /mnt/data/bin/lein/repl
      (require '[datomic.api :as d])
      (use 'pseudoace.utils)	 
      (use 'wb.acedump)				;; use the 'pseudoace/src/clj/wb/acedump.clj' package
      (def uri "datomic:free://localhost:4334/geneace")
      (def con (d/connect uri))
      (with-outfile "file-to-send-stuff.txt"	;; file to redirect STDOUT to
        (println "")				;; ACE files should start with a blank line for easy file concatenation
        (dump-class (d/db con) "Gene")		;; dump out the class 'Gene' to STDOUT
        (dump-class (d/db con) "Variation"))	;; and dump class "Variation" to STDOUT

Error messages

If the transactor is not running, you will see a message like this in repl:

    ConnectException Connection refused  java.net.PlainSocketImpl.socketConnect (PlainSocketImpl.java:-2)

If you start 'lein repl' and are not in the db/pseudoace directory of the github repository, you will see a message like this:

    nREPL server started on port 45166 on host 127.0.0.1 - nrepl://127.0.0.1:45166
    Exception in thread "nREPL-worker-0" java.lang.NoSuchMethodError: clojure.tools.nrepl.StdOutBuffer.length()I
            at clojure.tools.nrepl.middleware.session$session_out$fn__7630.doInvoke(session.clj:43)
            at clojure.lang.RestFn.invoke(RestFn.java:460)
            at clojure.tools.nrepl.middleware.session.proxy$java.io.Writer$ff19274a.write(Unknown Source)
    [continues on for another 5 screens of text]

Dumping the user database

Dump all "user" entities in a form that can be easily transacted into another database.

     (require '[datomic.api :as d :refer (q db touch entity)])
     (use 'pseudoace.utils)

     (def uri "datomic:free://localhost:4334/name-of-database")
     (def con (d/connect uri))
     (def ddb (db con))

     (->> (q '[:find [?u ...] 
               :where [?u :user/name _]] 
              ddb) 
           (map (fn [u] 
                  (as-> (entity ddb u) $ 
                        (touch $) 
                        (into {} $) 
                        (vassoc $ 
                             :db/id 
                                (d/tempid :db.part/user)) 
                             :user/wbperson 
                                (if-let [p (:user/wbperson $)] 
                                    [:person/id (:person/id p)]))))) 
           (pr-str) 
           (spit "user-data.edn"))

To read this .edn file into the target database:

% /mnt/data/bin/lein/repl
  (require '[datomic.api :as d])
  (def uri "datomic:free://localhost:4334/name-of-database")
  (def con (d/connect uri))
  (def users-data (read-string (slurp "user-data.edn")))
  @(d/transact con users-data)