Cookbook: General - folio-org/folio-analytics GitHub Wiki

General

In this cookbook you will find general introductions to how to query data from Metadb using SQL.

In the wiki of folio-analytics you can find various cookbooks. Each cookbook explain specific tips how you can querying data in their area.

Table of Contents

1. FOLIO system layers

FOLIO has multiple system layers.

Layer
UI Toolkit
Apps
OKAPI
System Layer

The system layer is interesting for querying data via SQL directly from the FOLIO database. The actual FOLIO database is located in this layer. The other layers above provide interfaces to interact with applications. Metadb uses the OKAPI layer to communicate with the FOLIO system.

[top]

2. What is Metadb?

Software that creates and update a Postgres database with data from FOLIO. It is an automated data flow through the messaging service Kafka (OKAPI layer), which means Metadb has live data. A transformation is carried out that makes the data relationally available and creates different types of tables.

Documentation: https://metadb.dev/doc/

[top]

3. Types of tables in the database from Metadb

Please note that detailed documentation can be found on https://metadb.dev/doc/.

3.1. Derived tables

These kind of tables are created by the community and located in the Folio Analytics repository. The SQL statements to create the derived tables are downloaded from the repository and executed by Metadb. Derived tables behave similarly to materialized views. That means, separate tables are created that merge data from different tables and provide it as a single table. This reduces the complexity of queries and can improve performance.

[top]

3.2. Extracted tables

Metadb creates an own database which contains the data from the FOLIO database. To get the data, Metadb use the Kafka messaging system. After Metadb received the data, a transformation process begin and creates different kind of "extracted tables". However, the respective tables only exist in Metadb if data is also available for them.

[top]

3.2.1. Main tables

Tables with two underscores at the end of the name.

Example: patrongroup__

These tables contain the current data sets and the history of all previous states (historical data).

[top]

3.2.2. Current tables

Tables without two underscores at the end of the name.

Example: patrongroup

These tables only contain the current records.

[top]

3.2.3. Main transformed tables

Tables with 2 underscores and a t and two underscores at the end.

Example: patrongroup__t__

Metadb generates transformed tables to make certain attributes relationally searchable, such as JSON objects. These tables here are similar to the main tables, only they are transformed.

[top]

3.2.4. Current transformed tables

Tables with 2 underscores and a t at the end.

Example: patrongroup__t

These are like the main transformed tables, but only contain the current data records.

[top]

4. Extract data from JSON objects

Sometimes it is necessary to extract data from a JSON object. JSON objects can be nested simple or complex. A good practice to extract the data via SQL is to use the predefined functions of Postgres. There is also the option to use arrow notation, but this can easily get confusing.

Example of a JSON object:

{
  "id": "84cfc15b-942d-4028-bf26-e5075dbbbf0a",
  "total": 100.0,
  "metadata": {
    "createdDate": "2023-05-09T08:38:32.555Z",
    "updatedDate": "2023-05-10T06:11:48.297Z",
    "createdByUserId": "67adb68e-232f-4cdf-8bce-faf09093071a",
    "updatedByUserId": "67adb68e-232f-4cdf-8bce-faf09093071a"
  },
  "poLineId": "86921b61-51fd-4d71-af41-b4fd9365bef5",
  "quantity": 2,
  "subTotal": 100.0,
  "invoiceId": "14bb8978-a386-4cb3-9e6d-b15ef41b4ce4",
  "adjustments": [],
  "description": "Expense_Class_Id",
  "accountNumber": "111",
  "adjustmentsTotal": 0.0,
  "referenceNumbers": [],
  "fundDistributions": [
    {
      "code": "med_textbook_2022",
      "value": 50.0,
      "fundId": "8a66888e-06d3-4fa0-a3b7-cc7287905b54",
      "expenseClassId": "1bcc3247-99bf-4dca-9b0f-7bc51a2998c2",
      "distributionType": "percentage"
    },
    {
      "code": "med_textbook_2022",
      "value": 50.0,
      "fundId": "8a66888e-06d3-4fa0-a3b7-cc7287905b54",
      "expenseClassId": "5b5ebe3a-cf8b-4f16-a880-46873ef21388",
      "distributionType": "percentage"
    }
  ],
  "invoiceLineNumber": "1",
  "invoiceLineStatus": "Approved",
  "releaseEncumbrance": true
}

As you can see, this example contains multiple levels in the JSON object. Let's look at some examples of extraction.

First level:

SELECT 
    jsonb_extract_path_text(invoice_lines.jsonb, 'poLineId') :: uuid AS poline_id 
FROM 
    folio_invoice.invoice_lines

Secound level:

SELECT 
    jsonb_extract_path_text(invoice_lines.jsonb, 'metadata', 'createdDate') :: timestamptz AS metadata__created_date 
FROM 
    folio_invoice.invoice_lines

Arrays:

SELECT 
    invoice_lines.id AS invoice_line_id,
    jsonb_extract_path_text(invoice_lines.jsonb, 'total')::numeric(19,4) AS invoice_line_total,
    jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'fundId')::uuid AS invoice_line_fund_id,
    jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'value')::numeric(19,4) AS invoice_line_distribution_value,
    jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'distributionType') AS invoice_line_distribution_type,
    jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'expenseClassId') :: UUID AS invoice_line_distribution_expense_class_id
FROM 
    folio_invoice.invoice_lines

Arrays usually occur where a field in the input masks in FOLIO can be duplicated with a click in order to enter multiple values. As an example, the addresses for a user.

Using the functions of Postgres, the data can be extracted. It is also possible to nest the functions within each other.

You can also use a CROSS JOIN.

Example:

SELECT
    inst.id AS instance_id,
    jsonb_extract_path_text(inst.jsonb, 'hrid') AS instance_hrid,
    jsonb_extract_path_text(ident.jsonb, 'identifierTypeId')::uuid AS identifier_type_id,
    idtype.name AS identifier_type_name,
    jsonb_extract_path_text(ident.jsonb, 'value') AS identifier,
    ident.ordinality AS identifier_ordinality
FROM
    folio_inventory.instance AS inst
    CROSS JOIN LATERAL jsonb_array_elements(jsonb_extract_path(inst.jsonb, 'identifiers')) WITH ORDINALITY AS ident (jsonb)
    LEFT JOIN folio_inventory.identifier_type__t AS idtype ON jsonb_extract_path_text(ident.jsonb, 'identifierTypeId')::uuid = idtype.id

[top]

5. Using the extracted transformed tables in Metadb

Extracted transformed tables contain the data from the FOLIO database in a relationally form. That means, you can querying the data in a usual way without functions.

Example:

SELECT
    id,
    total,
    account_number,
    adjustments_total,
    quantity,
    invoice_id,
    invoice_line_number,
    sub_total,
    invoice_line_status,
    po_line_id,
    description,
    release_encumbrance,
    subscription_start,
    subscription_end
FROM 
    folio_invoice.invoice_lines__t

Please note the different types of extracted tables.

[top]

6. Using CTE to prepare data to use them in queries

CTEs are Common Table Expressions, which are temporary result sets used within a query.

Typically you do not need CTE to query data in Metadb. However, sometimes it may be necessary to use this technique. An example would be if you wanted to extract UUIDs from an array of a JSON object and use them in the WHERE clause to link to other tables. These UUIDs must be extracted beforehand because you cannot compare data sets with individual values ​​in the WHERE clause.

Example:

/*
 * CTE Start
 */
WITH invoice_lines_fund_distribution AS (
    SELECT 
        invoice_lines.id AS invoice_line_id,
        jsonb_extract_path_text(invoice_lines.jsonb, 'total')::numeric(19,4) AS invoice_line_total,
        jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'fundId')::uuid AS invoice_line_fund_id,
        jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'value')::numeric(19,4) AS invoice_line_distribution_value,
        jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'distributionType') AS invoice_line_distribution_type,
        jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'expenseClassId') :: UUID AS invoice_line_distribution_expense_class_id
    FROM 
        folio_invoice.invoice_lines
),
invoice_vendors AS (
    SELECT
        invoices__t.id AS invoice_id,
        organizations__t.name AS invoice_vendor_name
    FROM 
        folio_invoice.invoices__t
        INNER JOIN folio_organizations.organizations__t ON organizations__t.id = invoices__t.vendor_id 
),
finance AS (
    SELECT
        fiscal_year__t.id AS fiscal_year_id,
        fiscal_year__t.code AS fiscal_year,
        ledger__t.id AS ledger_id,
        ledger__t.name AS ledger_name,
        budget__t.id AS budget_id,
        budget__t.name AS budget_name,
        fund__t.id AS fund_id,
        fund__t.code AS fund_code
    FROM            
        folio_finance.fiscal_year__t
        LEFT JOIN folio_finance.budget__t ON budget__t.fiscal_year_id = fiscal_year__t.id
        LEFT JOIN folio_finance.fund__t ON fund__t.id = budget__t.fund_id
        LEFT JOIN folio_finance.ledger__t ON ledger__t.id = fund__t.ledger_id
),
transactions AS (
    SELECT
        transaction__t.id AS transaction_id,
        transaction__t.from_fund_id AS from_fund_id,
        transaction__t.amount AS transaction_amount,
        transaction__t.currency AS transaction_currency,
        transaction__t.fiscal_year_id,
        transaction__t.source_invoice_line_id AS source_invoice_line_id,
        transaction__t.expense_class_id AS transaction_expense_class_id,
        expense_class__t.code AS expense_class_code,
        expense_class__t.name AS expense_class_name,
        expense_class__t.external_account_number_ext
    FROM
        folio_finance.transaction__t
        LEFT JOIN folio_finance.expense_class__t ON expense_class__t.id = transaction__t.expense_class_id
)
/*
 * CTE End
 */
/*
 * Primary Query
 */
SELECT
    invoices__t.id AS invoice_id,
    invoices__t.status AS invoice_status,
    invoices__t.folio_invoice_no AS folio_invoice_number,
    invoices__t.vendor_id,
    invoice_vendors.invoice_vendor_name AS vendor_name,
    invoices__t.vendor_invoice_no AS vendor_invoice_number,
    invoices__t.invoice_date,
    invoice_lines__t.id AS invoice_line_id,
    invoice_lines__t.total AS invoice_line_total,
    invoices__t.currency AS invoice_currency,
    invoice_lines_fund_distribution.invoice_line_distribution_value,
    invoice_lines_fund_distribution.invoice_line_distribution_type,
    invoice_lines_fund_distribution.invoice_line_fund_id,
    inv_line_fund.code AS invoice_line_fund_code,
    inv_line_expense_class.id AS invoice_line_expense_class_id,
    inv_line_expense_class.code AS invoice_line_expense_class_code,
    inv_line_expense_class.name AS invoice_line_expense_class_name,
    inv_line_expense_class.external_account_number_ext,
    invoices__t.exchange_rate,
    transactions.transaction_id,
    transactions.transaction_amount,
    transactions.transaction_currency,
    finance.fund_id AS transaction_fund_id,
    finance.fund_code AS transaction_fund_code,
    finance.fiscal_year_id,
    finance.fiscal_year,
    finance.ledger_id,
    finance.ledger_name,
    finance.budget_id,
    finance.budget_name,        
    transactions.transaction_expense_class_id,
    transactions.expense_class_code AS transactions_expense_class_code,
    transactions.expense_class_name AS transactions_expense_class_name,
    transactions.external_account_number_ext AS transactions_external_account_number_ext
FROM
    folio_invoice.invoices__t
    LEFT JOIN folio_invoice.invoice_lines__t ON invoice_lines__t.invoice_id = invoices__t.id
    LEFT JOIN invoice_lines_fund_distribution ON invoice_lines_fund_distribution.invoice_line_id = invoice_lines__t.id
    LEFT JOIN folio_finance.expense_class__t AS inv_line_expense_class ON inv_line_expense_class.id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
    LEFT JOIN folio_finance.fund__t AS inv_line_fund ON inv_line_fund.id = invoice_lines_fund_distribution.invoice_line_fund_id
    LEFT JOIN invoice_vendors ON invoice_vendors.invoice_id = invoices__t.id    
    LEFT JOIN transactions ON transactions.source_invoice_line_id = invoice_lines__t.id
        AND transactions.from_fund_id = invoice_lines_fund_distribution.invoice_line_fund_id
    LEFT JOIN finance ON finance.fund_id = transactions.from_fund_id
        AND finance.fiscal_year_id = transactions.fiscal_year_id
WHERE 
    invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id IS NULL
UNION 
SELECT
    invoices__t.id AS invoice_id,
    invoices__t.status AS invoice_status,
    invoices__t.folio_invoice_no AS folio_invoice_number,
    invoices__t.vendor_id,
    invoice_vendors.invoice_vendor_name AS vendor_name,
    invoices__t.vendor_invoice_no AS vendor_invoice_number,
    invoices__t.invoice_date,
    invoice_lines__t.id AS invoice_line_id,
    invoice_lines__t.total AS invoice_line_total,
    invoices__t.currency AS invoice_currency,
    invoice_lines_fund_distribution.invoice_line_distribution_value,
    invoice_lines_fund_distribution.invoice_line_distribution_type,
    invoice_lines_fund_distribution.invoice_line_fund_id,
    inv_line_fund.code AS invoice_line_fund_code,
    inv_line_expense_class.id AS invoice_line_expense_class_id,
    inv_line_expense_class.code AS invoice_line_expense_class_code,
    inv_line_expense_class.name AS invoice_line_expense_class_name,
    inv_line_expense_class.external_account_number_ext,
    invoices__t.exchange_rate,
    transactions.transaction_id,
    transactions.transaction_amount,
    transactions.transaction_currency,
    finance.fund_id AS transaction_fund_id,
    finance.fund_code AS transaction_fund_code,
    finance.fiscal_year_id,
    finance.fiscal_year,
    finance.ledger_id,
    finance.ledger_name,
    finance.budget_id,
    finance.budget_name,
    transactions.transaction_expense_class_id,
    transactions.expense_class_code AS transactions_expense_class_code,
    transactions.expense_class_name AS transactions_expense_class_name,
    transactions.external_account_number_ext AS transactions_external_account_number_ext
FROM
    folio_invoice.invoices__t
    LEFT JOIN folio_invoice.invoice_lines__t ON invoice_lines__t.invoice_id = invoices__t.id
    LEFT JOIN invoice_lines_fund_distribution ON invoice_lines_fund_distribution.invoice_line_id = invoice_lines__t.id
    LEFT JOIN folio_finance.expense_class__t AS inv_line_expense_class ON inv_line_expense_class.id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
    LEFT JOIN folio_finance.fund__t AS inv_line_fund ON inv_line_fund.id = invoice_lines_fund_distribution.invoice_line_fund_id
    LEFT JOIN invoice_vendors ON invoice_vendors.invoice_id = invoices__t.id    
    LEFT JOIN transactions ON transactions.source_invoice_line_id = invoice_lines__t.id
        AND transactions.from_fund_id = invoice_lines_fund_distribution.invoice_line_fund_id
        AND transactions.transaction_expense_class_id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
    LEFT JOIN finance ON finance.fund_id = transactions.from_fund_id
        AND finance.fiscal_year_id = transactions.fiscal_year_id
WHERE 
    invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id IS NOT NULL

As you can see, the CTEs are also used here so that the data from the respective CTE can be used multiple times.

[top]

7. 0 vs NULL

A NULL should not be confused with a value of 0. A NULL indicates a lack of a value, which is not the same thing as a zero value. For example, consider the question "How many books does Adam own?" The answer may be 0 (we know that he owns none) or NULL (we do not know how many he owns). Source: https://en.wikipedia.org/wiki/Null_(SQL)

For calculations or comparisons, it may be necessary to assign a default value to NULL values. Postgres offers the predefined function COALESCE() for this. The function takes two arguments: 1. The value that needs to be checked if it is NULL and 2. The value that should replace the NULL value.

Example:

SELECT 
    jsonb_extract_path_text(loan.jsonb, 'itemId') :: UUID AS item_id,
    COALESCE(COUNT(id), 0) AS loans
FROM 
    folio_circulation.loan
GROUP BY 
    item_id

In this example, we replace NULL values ​​with 0 in the column loans. If an item has never been borrowed, it is assigned the value 0. Otherwise there would be a NULL value there. This is not good if NULL values ​​occur during calculations and comparisons.

[top]

8. Non-Roman character

Most databases nowadays store data in UTF format. In this case you can use these symbols directly in your query.

Example:

SELECT 
    sa_name AS 中國條約
FROM 
    folio_agreements.subscription_agreement sa 
WHERE 
    sa_name = '中國條約'

In order to be able to save this query, the file must be saved in UTF format, otherwise the symbols will not be encoded correctly.

[top]

8.1. Search and replace symbols

You can also search and replace symbols. The trick is to convert the symbols into a uniform system and thus enable the basis for a better quering. Here it is advisable to convert the symbols into a HEX value. Each symbol has a HEX value. PSQL offers functions for this. PSQL offers the simplest option with the data type bytea.

Example:

sa_name::bytea

If you look at attribute value in DBeaver you will get the view in a hexviewer.

Sample output:

E4 B8 AD E5 9C 8B E6 A2 9D E7 B4 84 20 5B 53 74 65 66 61 6E 5D

Search symbols

Example:

Find all agreements that have non-Roman characters in their names

SELECT 
    * 
FROM 
(
SELECT 
    sa_name, 
    /*
     * 1. delete all latin script symbols with the functions translate() and replace()
     * 2. look at the HEX value at the beginning. If they start with a letter, then you have a title with non-latin symbols
     */
    length(REPLACE(REPLACE(REPLACE(regexp_matches(encode(REPLACE(TRANSLATE(sa_name, 'abcdefghijklmnopqrstuvwxyzäöü+()- ,#+[]ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')::bytea, 'hex'), '[a-f]*')::varchar, '{', ''), '}', ''), '"', '')) AS len
FROM 
    folio_agreements.subscription_agreement sa
) AS t
WHERE 
    t.len > 0

Replace symbols

Example:

Replace the space with another symbol. Space = 20 , the other symbol = E4B8AD (Tables: https://www.utf8-chartable.de/unicode-utf8-table.pl)

convert_from(decode(encode(concat('\x', replace(encode('A B', 'hex'), '20', 'E4B8AD'))::bytea, 'escape'), 'escape'), 'UTF-8')::varchar

[top]