Contributing to Variant Store - GenomicsDB/GenomicsSampleAPIs GitHub Wiki

#Variant Contribution Notes

The following page describes best practices for contributing to the Variant Store. This is a work in progress.

Alembic: migration management for MetaDB

The database migrations for MetaDB are created, managed, and invoked with alembic. The GenomicsSampleAPIs/metadb/alembic directory is an initialized alembic environment that contains the most current migration management scripts, read more about the organization of this directory here.

Quick Start Instructions:

Procedure: In alembic.ini, set sqlalchemy.url to your DB location. See Setting Up Alembic for the First Time for further details.

Run:

alembic revision --autogenerate -m "<message>"

Note: autogenerate will catch most of the simple things - adding/dropping columns, indexes, etc. The alembic management script will still need to be reviewed and touched up. Refer below for more information.

Alembic - The Long Way:

Setting Up Alembic for the First Time

This step will require that you are running inside the virtual environment, as specified in the Setup and Initialization section.

Set your PYTHONPATH to include the Genomics Sample APIs modules: export PYTHONPATH=$PYTHONPATH:/path/to/GenomicsSampleAPIs/repo/
Create a PostGres database: createdb <db_name>
Tell PostGres to use triggers: createlang plpgsql <db_name>
Copy alembic.ini.example to alembic.ini
Edit the line sqlalchemy.url = driver... in alembic.ini. If you have issues, see the alembic reference docuemntation.

for example:

sqlalchemy.url = postgresql+psycopg2://localhost/<db_name>

In the GenomicsSampleAPIs repo (where alembic.ini is) run alembic upgrade head

Adding a New Table

To add a new table make sure you are at the root of the GenomicsSampleAPIs repo where the alembic.ini file lives. Run the following command where <name> is the name of table you wish to create:

alembic revision -m "create <name> table"

If you see: Generating ... the command execute successfully.

This generates a new file in alembic/versions the begins with a revision id that will look something like 43aacdccadc1_create_<name>_table.py. This file contains a skeleton which allows you to specify the commands that should take place in the database upgrade, and a place to specify a database downgrade. For each command in upgrade, a command should exist in downgrade which will remove this from the database. The following standard for defining the creation of a table will look something like:

...
def upgrade():
    op.create_table(
      '<name>',
      sa.Column('id', sa.BigInteger, primary_key=True)
      ...

    )

def downgrade():
    op.drop_table('<name>')

For a basic example a completely defined table, see the individual table. For naming conventions, take a look at some existing management scripts. Note that this definition is very similar to the SQLAlchemy model definition with respect to two things:

MetaDB alembic naming conventions for fields use underscores, whereas SQLAlchemy uses CamelCase.
Relationships are not defined within alembic, all relationships are defined on the SQLAlchemy definition. See alembic workspace definition and sqlalchemy workspace definition as an example.

One the migration script is defined, make sure you are in the root directory of the GenomicsSampleAPIs repo and run alembic upgrade head. If successful, you should see something that looks like:

INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 4db88f2382fb, create <name> table

Tables with Foreign Keys

To create a table with a foreign key, the table the foreign key references needs to exist. If the table exists, follow the directions above. If the table does not exist, you will need to create the table the needs to exist and then add a column specifying the foreign key relationship. Generally, if you need to introduce a dependency on a table that is created downstream of an existing table, then the dependency will have to be added in a revision after both tables exist.

Run the revision command: `alembic revision -m "create <new_table_name> table"
Define the new table as specified above.
Run the revision command: `alembic reivision -m "define fk from <new_table_name>"
Add a column in the upgrade, and drop a column in the downgrade:

... def upgrade(): op.add_column( '', sa.Column('<new_table_name>_id', sa.BigInteger, sa.ForeignKey('<new_table_name>.id'), nullable=False) )

def downgrade(): op.drop_column('', '<new_table_name>_id')

Note: These management scripts define the migrations for the MetaDB. The models defined in GenomicsSampleAPIs/metadb/models still have to be defined. If you make a table in a revision script, a new model needs to be created. If you add or drop a column in a revision script, this change needs to be reflected in the model.

Writing and Running Tests

pip install pytest pytest-cov coverage pytest-flask

gunzip test/data/test*.gz

createdb metadb
alembic upgrade head

psql test < test/data/test.db

/*assuming tiledb is in your path*/

tiledb_define_array \
-w /path/to/your/workspace/ \
-A test \
-a END,REF,ALT,QUAL,FILTER,BaseQRankSum,ClippingRankSum,MQRankSum,ReadPosRankSum,DP,MQ,MQ0,DP_FORMAT,MIN_DP,GQ,SB,AD,PL,AF,AN,AC,GT,PS \
-d samples,position \
-D 0,7,0,4000000000 \
-t int64,char:var,char:var,float,int:var,float,float,float,float,int,float,int,int,int,int,int:4,int:var,int:var,float:var,int,int:var,int:var,int,int64 \
-o column-major \
-c 1000 \
-s 8

tiledb_load_csv -w /path/to/your/workspace/ -A test -p data/test/test.csv

/* alter workspace and array in metadb */
psql metadb
metadb=> update workspace set name = '/path/to/your/workspace' where id = 1;
metadb=> update db_array set name = 'test' where id = 1; 
\q