Predict size of database growth - dcm4che/dcm4chee-arc-light GitHub Wiki

Problem

The size of database growth depends on the structure of receiving studies. Studies containing many DICOM objects (e.g. CT/MR studies with legacy single-frame CT/MR Images) correlate with more database records and therefore a larger database growth per study - but a lesser database growth per archived DICOM object - compared to study with fewer DICOM objects (e.g. DX studies).

On purging database records for individual DICOM objects, the reduction of the database size is greater for studies containing many DICOM objects. Purging database records relies on and requires metadata of objects of individual series to be stored on a storage system external from the database.

Solution

Send a representative set of studies to a test installation of the archive and gauge the database size before and after purging of database records for individual DICOM objects.

Instructions

  1. Install a dockerized version of the archive according Run minimum set of archive services on a single host.

  2. Determine initial database size

    Invoke psql - PostgreSQL interactive terminal in the database container connecting to the archive database:

    $ docker exec -it db-container-name psql pacsdb pacs
    psql (10.0)
    Type "help" for help.

    List size of archive database:

    pacsdb=# \l+ pacsdb
                                                   List of databases
      Name  |  Owner   | Encoding |  Collate   |   Ctype    | Access privileges | Size  | Tablespace | Description 
    --------+----------+----------+------------+------------+-------------------+-------+------------+-------------
     pacsdb | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                   | 10 MB | pg_default | 
    (1 row)
  3. Send representative set of studies to the archive

    E.g.:

    $ storescu -c DCM4CHEE@localhost:11112 ~/testdata/MESA-storage-A_11_5_0
    $ storescu -c DCM4CHEE@localhost:11112 ~/testdata/MESA-storage-B_10_11_0
    pacsdb=# select count(*) from patient;
     count 
    -------
        62
    (1 row)
    pacsdb=# select count(*) from study;
     count 
    -------
        68
    (1 row)
    pacsdb=# select count(*) from series;
     count 
    -------
       118
    (1 row)
    pacsdb=# select count(*) from instance;
     count 
    -------
      1462
    (1 row)
  4. Garbage-collect database and list size

    pacsdb=# vacuum full;
    VACUUM
    pacsdb=# \l+ pacsdb
                                                   List of databases
      Name  |  Owner   | Encoding |  Collate   |   Ctype    | Access privileges | Size  | Tablespace | Description 
    --------+----------+----------+------------+------------+-------------------+-------+------------+-------------
     pacsdb | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                   | 13 MB | pg_default | 
    (1 row)

    The difference to the initial database size is the actual size of the database growth by the stored studies. E.g.: 13 MB - 10 MB = 3 MB.

  5. Create metadata objects for individual series

    Clone Archive Device child Storage Descriptor fs1:

    Change Storage ID and Storage URI of cloned Storage Descriptor to series-metadata and file:///storage/series-metadata/:

    Set Archive Device attributes Series Metadata Storage ID to series-metadata and Update Series Metadata Polling Interval to 1 minute:

    Trigger creation of Series Metadata by

    pacsdb=# update series set metadata_update_time=current_timestamp;
    UPDATE 118

    Verify if the creation metadata for all series is completed by

    pacsdb=# select count(*) from series where metadata_update_time is not null;
     count 
    -------
         0
    (1 row)

    Verify size of created metadata files

    $ du -s --si /var/local/dcm4chee-arc/storage/series-metadata
    3,3M	/var/local/dcm4chee-arc/storage/series-metadata
  6. Purge Instance records

    Set Archive Device attribute Purge Instance Records Polling Interval to 1 minute:

    Trigger purging of instance records by

    pacsdb=# update series set inst_purge_time=current_timestamp;
    UPDATE 118

    Verify if the purging of instance records is completed by

    pacsdb=# select count(*) from instance;
     count 
    -------
         0
    (1 row)
  7. Garbage-collect database and list size

    pacsdb=# vacuum full;
    VACUUM
    pacsdb=# \l+ pacsdb
                                               List of databases
      Name  |  Owner   | Encoding |  Collate   |   Ctype    | Access privileges | Size  | Tablespace | Description 
    --------+----------+----------+------------+------------+-------------------+-------+------------+-------------
     pacsdb | postgres | UTF8     | en_US.utf8 | en_US.utf8 |                   | 11 MB | pg_default | 
    (1 row)

    So the purging of the instance records reduced the database growth by the stored studies to 11 MB - 10 MB = 1 MB.

⚠️ **GitHub.com Fallback** ⚠️