Modifying the database schema - aiidateam/aiida-core GitHub Wiki

Testing the migrations from v1.6.5 -> v2.0.0

This is a temporary section with instructions for developers to have them test the database migrations that will be released with v2.0. The instructions below hopefully make it as easy as possible to test this.

Preparing environment

  • Checkout the latest develop branch: git checkout develop && git pull
  • Install latest dependencies: pip install -U -e .[tests,pre-commit]
  • Run verdi status: this will update your configuration to the latest schema version

Setting up profile

  • Create a clone of the PostgreSQL database you want to migrate

    • Login as the postgres user: sudo su - postgres
    • Load the postgres program: psql
    • If it is already loaded in postgres, you can clone it in psql directly: CREATE DATABASE aiida_clone WITH TEMPLATE aiida_original_db OWNER aiida; Make sure to change the names of the databases and the owner of course. IMPORTANT: if you are setting a new database owner, you'll need also to assign permissions for each table. Check here to know how to do it.
    • If the database is on another machine and you want to test the migration on your workstation.
      • Go to the remote machine and dump the database: pg_dump --no-privileges --no-owner -h localhost -d aiida_original_db -U aiida -W > aiida_original_db.psql
      • Copy over the aiida_original_db.psql file to your workstation
      • Create a new database in psql: CREATE DATABASE aiida_clone OWNER aiida;
      • Load the database dump: psql -h localhost -d aiida_original_db -U aiida -W < aiida_original_db.psql
  • Check statistics of the database (this information should be kept for reporting):

    • Note whether it is Django or SqlAlchemy. If you don't know, run SELECT * FROM alembic_version; in psql. If it returns a value, it is SqlAlchemy, if it errors with ERROR: relation "alembic_version" does not exist it is Django
    • Get database node count: SELECT count(*) FROM db_dbnode;
    • Get database size: SELECT pg_size_pretty(pg_database_size('aiida_clone'));
    • Get database revision:
      • For SqlAlchemy: SELECT * FROM alembic_version;
      • For Django: SELECT name FROM django_migrations WHERE app = 'db' ORDER BY id DESC LIMIT 1;
  • Create a clone of the repository (Note: this is only necessary if your database revision is below a certain revision; the migrations above it will not affect the repository, including the repository migration itself, as it will leave the original repo intact and simply write the new disk object store in parallel.)

    • Django: if you have revision 0027 or above, there is no need to clone the repo
    • SqlAlchemy: if your revision is in the following list, there is no need to clone the repo: ['1de112340b16', '1de112340b17', '1de112340b18', '34a831f4286d', '535039300e4a', '1feaea71bd5a', '7536a82b2cc4', '0edcdd5a30f0', 'bf591f31dd12', '118349c10896', '91b573400be5', '7b38a9e783e7', 'e734dd5e50d7', 'e797afa09270', '26d561acd560', '07fac78e6209', 'de2eaf6978b4', '1830c8430131', '1b8ed3425af9', '3d6190594e19', '5a49629f0d45', '5ddd24e52864', 'd254fdfed416', '61fc0913fae9', 'ce56d84bcc35']
  • Create a profile with the correct database and repository configured

    • Easiest is to open config.json and clone an entry and simply update the name of the database and the location of the repository
    • For the storage.backend key you should simply put "psql_dos" regardless whether the database has a Django or SqlAlchemy schema. The migrations will automatically detect this from the database itself.
    • IMPORTANT: if the database has an old schema version (see the point above) you should have made a clone of the repository and you should make sure that the storage.config.repository_uri key points to the correct path
  • Before running any migration, you should be able to run verdi -p PROFILE status and verdi -p PROFILE storage version

Running the migration

  • Make sure the daemon is not running
  • Run time verdi -p aiida-profile storage migrate -f. IMPORTANT do not forget the time in front. We would like to gather this information to get an idea of how long the migrations typically take.
  • Copy the log messages from the migrations printed to stdout.

Checks after the migration

Note: You can use verdi devel run-sql "SQL_TEXT", to run any SQL against the profile's database

  • Rerun the statistics database size and node count in psql:
    • SELECT count(*) FROM db_dbnode;
    • SELECT pg_size_pretty(pg_database_size('aiida_clone'));
  • Run verdi status and check that storage connection is green
  • Run verdi storage info --statistics and check the outcome is as expected
  • Open verdi shell and do some tests: queries, opening repository files of some nodes etc.

Reporting

For each database for which you test the migration, please report the following:

  • Database backend (Django or SqlAlchemy)
  • Starting revision
  • Node count before migration
  • Node count after migration
  • Database size before migration
  • Database size after migration
  • Time taken for the actual migration
  • Messages printed to stdout by the migrations
  • Any errors you encountered or problems you noticed afterwards when manually inspecting the data
  • Output of the following command: verdi devel run-sql "SELECT pt.tablename AS TableName, t.indexname AS IndexName, pc.reltuples, pg_size_pretty(pg_relation_size(quote_ident(pt.tablename)::text)), pg_size_pretty(pg_relation_size(quote_ident(t.indexrelname)::text)), t.idx_scan FROM pg_tables AS pt LEFT OUTER JOIN pg_class AS pc ON pt.tablename=pc.relname LEFT OUTER JOIN (SELECT pc.relname AS TableName, pc2.relname AS IndexName, psai.idx_scan, psai.indexrelname FROM pg_index AS pi JOIN pg_class AS pc ON pc.oid = pi.indrelid JOIN pg_class AS pc2 ON pc2.oid = pi.indexrelid JOIN pg_stat_all_indexes AS psai ON pi.indexrelid = psai.indexrelid )AS T ON pt.tablename = T.TableName WHERE pt.schemaname='public';"

Modifying the database schema

Django

The Django database schema can be found in aiida.backends.djsite.db.models <https://github.com/aiidateam/aiida-core/tree/develop/aiida/backends/djsite/db/models.py>_. If you need to change the database schema follow these steps:

  1. Make all the necessary changes to aiida.backends.djsite.db.models <https://github.com/aiidateam/aiida-core/tree/develop/aiida/backends/djsite/db/models.py>_.

  2. Create a new migration file by running::

    $ python aiida/backends/djsite/manage.py makemigrations
    

    This will create the migration file in aiida/backends/djsite/db/migrations whose name begins with a number followed by some description. If the description is not appropriate then change it to something better but retain the number.

  3. Open the generated file and make the following changes:

    .. code-block:: python

    from aiida.backends.djsite.db.migrations import upgrade_schema_version
    ...
    REVISION = # choose an appropriate version number (hint: higher than the last migration!)
    DOWN_REVISION = # the revision number of the previous migration
    ...
    class Migration(migrations.Migration):
        ...
        operations = [
            ...
            upgrade_schema_version(REVISION, DOWN_REVISION)
        ]
    
  4. The migration file now contains some migrations steps that were generated automatically. Please make sure that they are correct. Also, if you want to add some changes that affect the content of the database -- you should do it "manually" by adding some sql commands that will run directly on your database:

    .. code-block:: python

    forward_sql = [
        """UPDATE db_dbgroup SET type_string = 'auto.import' WHERE type_string = 'aiida.import';""",
        """UPDATE db_dbgroup SET type_string = 'auto.run' WHERE type_string = 'autogroup.run';""",
    ]
    
    reverse_sql = [
        """UPDATE db_dbgroup SET type_string = 'aiida.import' WHERE type_string = 'auto.import';""",
        """UPDATE db_dbgroup SET type_string = 'autogroup.run' WHERE type_string = 'auto.run';""",
    ]
    ...
    
    operations = [
        ...
        migrations.RunSQL(
            sql='\n'.join(forward_sql),
            reverse_sql='\n'.join(reverse_sql)),
        upgrade_schema_version(REVISION, DOWN_REVISION),
        ...
    ]
    

    As you can see here, you should not only provide the SQL commands to upgrade your database, but also the commands to revert these changes in case you want to perform a downgrade (see: sql=forward_sql, reverse_sql=reverse_sql)

  5. Change the LATEST_MIGRATION variable in aiida/backends/djsite/db/migrations/__init__.py to the name of your migration file:

    .. code-block:: python

    LATEST_MIGRATION = '0003_my_db_update'
    

    This allows AiiDA to get the version number from your migration and make sure the database and the code are in sync.

  6. Migrate your database to the new version using verdi and specifying the correct profile::

    $ verdi -p {profile} database migrate
    

    In case you want to (and, most probably, you should) test the downgrade operation, please check the list of available versions of the database::

    $ python aiida/backends/djsite/manage.py showmigrations db
    

    The output will look something like the following::

    db
    [X] 0001_initial
    [X] 0002_db_state_change
    [X] 0003_add_link_type
    [X] 0004_add_daemon_and_uuid_indices
    [X] 0005_add_cmtime_indices
    [X] 0006_delete_dbpath
    [X] 0007_update_linktypes
    [X] 0008_code_hidden_to_extra
    [X] 0009_base_data_plugin_type_string
    [X] 0010_process_type
    [X] 0011_delete_kombu_tables
    [X] 0012_drop_dblock
    [X] 0013_django_1_8
    [X] 0014_add_node_uuid_unique_constraint
    [X] 0015_invalidating_node_hash
    [X] 0016_code_sub_class_of_data
    [X] 0017_drop_dbcalcstate
    [X] 0018_django_1_11
    

    Choose the previous migration step and migrate to it::

    $ python aiida/backends/djsite/manage.py migrate db 0017_drop_dbcalcstate
    

    Check that both upgrade and downgrade changes are successful and if yes, go to the next step.

  7. Add tests for your migrations to the aiida-core/aiida/backends/djsite/db/subtests/migrations module.

| Note | Such a test can only be applied to the migration of the database content. | For example, you can not test modifications of the database column names.

SQLAlchemy

The SQLAlchemy database schema can be found in aiida.backends.sqlalchemy.models <https://github.com/aiidateam/aiida-core/tree/develop/aiida/backends/sqlalchemy/models>_. If you need to change the database schema follow these steps:

  1. Make all the necessary changes to the model than you would like to modify located in the aiida/backends/sqlalchemy/models directory.

  2. Create a new migration file by going to aiida/backends/sqlalchemy and executing::

    $ python aiida/backends/sqlalchemy/manage.py revision "This is the description for the next revision"
    

    This will create a new migration file in aiida/backends/sqlalchemy/migrations/versions whose names begins with an automatically generated hash and the provided message for this new migration. Modify the migration message to accurately describe the purpose of the migration.

  3. Have a look at the generated migration file and ensure the migration is correct. The file should contain automatically generated hashes that point to the previous and to the current revision:

    .. code-block:: python

    revision = 'e72ad251bcdb'
    down_revision = 'b8b23ddefad4'
    

    Also upgrade() and downgrade() function definitions should be present in the file:

    .. code-block:: python

    def upgrade():
        # some upgrade operations
    
    def downgrade():
        # some downgrade operations
    

    If you want to add some changes that affect the content of the database -- you should do it "manually" by adding some SQL commands that will run directly on your database. See the following example and adapt it for your needs:

    .. code-block:: python

    from sqlalchemy.sql import text
    
    forward_sql = [
        """UPDATE db_dbgroup SET type_string = 'auto.import' WHERE type_string = 'aiida.import';""",
        """UPDATE db_dbgroup SET type_string = 'auto.run' WHERE type_string = 'autogroup.run';""",
    ]
    
    reverse_sql = [
        """UPDATE db_dbgroup SET type_string = 'aiida.import' WHERE type_string = 'auto.import';""",
        """UPDATE db_dbgroup SET type_string = 'autogroup.run' WHERE type_string = 'auto.run';""",
    ]
    
    def upgrade():
        conn = op.get_bind()
        statement = text('\n'.join(forward_sql))
        conn.execute(statement)
    
    def downgrade():
        conn = op.get_bind()
        statement = text('\n'.join(reverse_sql))
        conn.execute(statement)
    

    If you want to learn more about the migration operations, you can have a look at the Alembic documentation <https://alembic.sqlalchemy.org/en/latest/>_.

  4. Migrate your database to the new version using verdi and specifying the correct profile::

    $ verdi -p {profile} database migrate
    
  5. Add tests for your migrations to aiida-core/aiida/backends/sqlalchemy/tests/test_migrations.py

Overview of manage.py commands

The alembic_manage.py provides several options to control your SQLAlchemy migrations. By executing::

$ python aiida/backends/sqlalchemy/manage.py --help

you will get a full list of the available arguments that you can pass and commands. Briefly, the available commands are:

  • upgrade: Upgrade to the later version.
  • downgrade: Downgrade the version of your database.
  • history: Lists the available migrations in chronological order.
  • current: Displays the current version of the database.
  • revision: Creates a new migration file based on the model changes.

Debugging Alembic

Alembic migrations should work automatically and migrate your database to the latest version. However, if you were using SQLAlchemy before we introduced Alembic, you may get a message like to following during the first migration::

sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation
"db_dbuser" already exists [SQL: '\nCREATE TABLE db_dbuser (\n\tid SERIAL
NOT NULL, \n\temail VARCHAR(254), \n\tpassword VARCHAR(128),
\n\tis_superuser BOOLEAN NOT NULL, \n\tfirst_name VARCHAR(254),
\n\tlast_name VARCHAR(254), \n\tinstitution VARCHAR(254), \n\tis_staff
BOOLEAN, \n\tis_active BOOLEAN, \n\tlast_login TIMESTAMP WITH TIME ZONE,
\n\tdate_joined TIMESTAMP WITH TIME ZONE, \n\tCONSTRAINT db_dbuser_pkey
PRIMARY KEY (id)\n)\n\n']

In this case, you should manually create the Alembic table in your database and add a line with the database version number. To do so, use psql to connect to the desired database::

$ psql aiidadb_sqla

where you should replace aiidadb_sqla with the name of the database that you would like to modify. Then, execute the following commands:

.. code-block:: sql

CREATE TABLE alembic_version (version_num character varying(32) not null, PRIMARY KEY(version_num));
INSERT INTO alembic_version VALUES ('e15ef2630a1b');
GRANT ALL ON alembic_version TO aiida;