Cookbook: General - folio-org/folio-analytics GitHub Wiki
General
In this cookbook you will find general introductions to how to query data from Metadb using SQL.
In the wiki of folio-analytics you can find various cookbooks. Each cookbook explain specific tips how you can querying data in their area.
Table of Contents
- 1. FOLIO system layers
- 2. What is Metadb?
- 3. Types of tables in the database from Metadb
- 4. Extract data from JSON objects
- 5. Using the extracted transformed tables in Metadb
- 6. Using CTE to prepare data to use them in queries
- 7. 0 vs NULL
- 8. Non-Roman character
1. FOLIO system layers
FOLIO has multiple system layers.
Layer |
---|
UI Toolkit |
Apps |
OKAPI |
System Layer |
The system layer is interesting for querying data via SQL directly from the FOLIO database. The actual FOLIO database is located in this layer. The other layers above provide interfaces to interact with applications. Metadb uses the OKAPI layer to communicate with the FOLIO system.
[top]
2. What is Metadb?
Software that creates and update a Postgres database with data from FOLIO. It is an automated data flow through the messaging service Kafka (OKAPI layer), which means Metadb has live data. A transformation is carried out that makes the data relationally available and creates different types of tables.
Documentation: https://metadb.dev/doc/
[top]
3. Types of tables in the database from Metadb
Please note that detailed documentation can be found on https://metadb.dev/doc/.
3.1. Derived tables
These kind of tables are created by the community and located in the Folio Analytics repository. The SQL statements to create the derived tables are downloaded from the repository and executed by Metadb. Derived tables behave similarly to materialized views. That means, separate tables are created that merge data from different tables and provide it as a single table. This reduces the complexity of queries and can improve performance.
[top]
3.2. Extracted tables
Metadb creates an own database which contains the data from the FOLIO database. To get the data, Metadb use the Kafka messaging system. After Metadb received the data, a transformation process begin and creates different kind of "extracted tables". However, the respective tables only exist in Metadb if data is also available for them.
[top]
3.2.1. Main tables
Tables with two underscores at the end of the name.
Example: patrongroup__
These tables contain the current data sets and the history of all previous states (historical data).
[top]
3.2.2. Current tables
Tables without two underscores at the end of the name.
Example: patrongroup
These tables only contain the current records.
[top]
3.2.3. Main transformed tables
Tables with 2 underscores and a t and two underscores at the end.
Example: patrongroup__t__
Metadb generates transformed tables to make certain attributes relationally searchable, such as JSON objects. These tables here are similar to the main tables, only they are transformed.
[top]
3.2.4. Current transformed tables
Tables with 2 underscores and a t at the end.
Example: patrongroup__t
These are like the main transformed tables, but only contain the current data records.
[top]
4. Extract data from JSON objects
Sometimes it is necessary to extract data from a JSON object. JSON objects can be nested simple or complex. A good practice to extract the data via SQL is to use the predefined functions of Postgres. There is also the option to use arrow notation, but this can easily get confusing.
Example of a JSON object:
{
"id": "84cfc15b-942d-4028-bf26-e5075dbbbf0a",
"total": 100.0,
"metadata": {
"createdDate": "2023-05-09T08:38:32.555Z",
"updatedDate": "2023-05-10T06:11:48.297Z",
"createdByUserId": "67adb68e-232f-4cdf-8bce-faf09093071a",
"updatedByUserId": "67adb68e-232f-4cdf-8bce-faf09093071a"
},
"poLineId": "86921b61-51fd-4d71-af41-b4fd9365bef5",
"quantity": 2,
"subTotal": 100.0,
"invoiceId": "14bb8978-a386-4cb3-9e6d-b15ef41b4ce4",
"adjustments": [],
"description": "Expense_Class_Id",
"accountNumber": "111",
"adjustmentsTotal": 0.0,
"referenceNumbers": [],
"fundDistributions": [
{
"code": "med_textbook_2022",
"value": 50.0,
"fundId": "8a66888e-06d3-4fa0-a3b7-cc7287905b54",
"expenseClassId": "1bcc3247-99bf-4dca-9b0f-7bc51a2998c2",
"distributionType": "percentage"
},
{
"code": "med_textbook_2022",
"value": 50.0,
"fundId": "8a66888e-06d3-4fa0-a3b7-cc7287905b54",
"expenseClassId": "5b5ebe3a-cf8b-4f16-a880-46873ef21388",
"distributionType": "percentage"
}
],
"invoiceLineNumber": "1",
"invoiceLineStatus": "Approved",
"releaseEncumbrance": true
}
As you can see, this example contains multiple levels in the JSON object. Let's look at some examples of extraction.
First level:
SELECT
jsonb_extract_path_text(invoice_lines.jsonb, 'poLineId') :: uuid AS poline_id
FROM
folio_invoice.invoice_lines
Secound level:
SELECT
jsonb_extract_path_text(invoice_lines.jsonb, 'metadata', 'createdDate') :: timestamptz AS metadata__created_date
FROM
folio_invoice.invoice_lines
Arrays:
SELECT
invoice_lines.id AS invoice_line_id,
jsonb_extract_path_text(invoice_lines.jsonb, 'total')::numeric(19,4) AS invoice_line_total,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'fundId')::uuid AS invoice_line_fund_id,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'value')::numeric(19,4) AS invoice_line_distribution_value,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'distributionType') AS invoice_line_distribution_type,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'expenseClassId') :: UUID AS invoice_line_distribution_expense_class_id
FROM
folio_invoice.invoice_lines
Arrays usually occur where a field in the input masks in FOLIO can be duplicated with a click in order to enter multiple values. As an example, the addresses for a user.
Using the functions of Postgres, the data can be extracted. It is also possible to nest the functions within each other.
You can also use a CROSS JOIN.
Example:
SELECT
inst.id AS instance_id,
jsonb_extract_path_text(inst.jsonb, 'hrid') AS instance_hrid,
jsonb_extract_path_text(ident.jsonb, 'identifierTypeId')::uuid AS identifier_type_id,
idtype.name AS identifier_type_name,
jsonb_extract_path_text(ident.jsonb, 'value') AS identifier,
ident.ordinality AS identifier_ordinality
FROM
folio_inventory.instance AS inst
CROSS JOIN LATERAL jsonb_array_elements(jsonb_extract_path(inst.jsonb, 'identifiers')) WITH ORDINALITY AS ident (jsonb)
LEFT JOIN folio_inventory.identifier_type__t AS idtype ON jsonb_extract_path_text(ident.jsonb, 'identifierTypeId')::uuid = idtype.id
[top]
5. Using the extracted transformed tables in Metadb
Extracted transformed tables contain the data from the FOLIO database in a relationally form. That means, you can querying the data in a usual way without functions.
Example:
SELECT
id,
total,
account_number,
adjustments_total,
quantity,
invoice_id,
invoice_line_number,
sub_total,
invoice_line_status,
po_line_id,
description,
release_encumbrance,
subscription_start,
subscription_end
FROM
folio_invoice.invoice_lines__t
Please note the different types of extracted tables.
[top]
6. Using CTE to prepare data to use them in queries
CTEs are Common Table Expressions, which are temporary result sets used within a query.
Typically you do not need CTE to query data in Metadb. However, sometimes it may be necessary to use this technique. An example would be if you wanted to extract UUIDs from an array of a JSON object and use them in the WHERE clause to link to other tables. These UUIDs must be extracted beforehand because you cannot compare data sets with individual values in the WHERE clause.
Example:
/*
* CTE Start
*/
WITH invoice_lines_fund_distribution AS (
SELECT
invoice_lines.id AS invoice_line_id,
jsonb_extract_path_text(invoice_lines.jsonb, 'total')::numeric(19,4) AS invoice_line_total,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'fundId')::uuid AS invoice_line_fund_id,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'value')::numeric(19,4) AS invoice_line_distribution_value,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'distributionType') AS invoice_line_distribution_type,
jsonb_extract_path_text(jsonb_array_elements(jsonb_extract_path(invoice_lines.jsonb, 'fundDistributions')), 'expenseClassId') :: UUID AS invoice_line_distribution_expense_class_id
FROM
folio_invoice.invoice_lines
),
invoice_vendors AS (
SELECT
invoices__t.id AS invoice_id,
organizations__t.name AS invoice_vendor_name
FROM
folio_invoice.invoices__t
INNER JOIN folio_organizations.organizations__t ON organizations__t.id = invoices__t.vendor_id
),
finance AS (
SELECT
fiscal_year__t.id AS fiscal_year_id,
fiscal_year__t.code AS fiscal_year,
ledger__t.id AS ledger_id,
ledger__t.name AS ledger_name,
budget__t.id AS budget_id,
budget__t.name AS budget_name,
fund__t.id AS fund_id,
fund__t.code AS fund_code
FROM
folio_finance.fiscal_year__t
LEFT JOIN folio_finance.budget__t ON budget__t.fiscal_year_id = fiscal_year__t.id
LEFT JOIN folio_finance.fund__t ON fund__t.id = budget__t.fund_id
LEFT JOIN folio_finance.ledger__t ON ledger__t.id = fund__t.ledger_id
),
transactions AS (
SELECT
transaction__t.id AS transaction_id,
transaction__t.from_fund_id AS from_fund_id,
transaction__t.amount AS transaction_amount,
transaction__t.currency AS transaction_currency,
transaction__t.fiscal_year_id,
transaction__t.source_invoice_line_id AS source_invoice_line_id,
transaction__t.expense_class_id AS transaction_expense_class_id,
expense_class__t.code AS expense_class_code,
expense_class__t.name AS expense_class_name,
expense_class__t.external_account_number_ext
FROM
folio_finance.transaction__t
LEFT JOIN folio_finance.expense_class__t ON expense_class__t.id = transaction__t.expense_class_id
)
/*
* CTE End
*/
/*
* Primary Query
*/
SELECT
invoices__t.id AS invoice_id,
invoices__t.status AS invoice_status,
invoices__t.folio_invoice_no AS folio_invoice_number,
invoices__t.vendor_id,
invoice_vendors.invoice_vendor_name AS vendor_name,
invoices__t.vendor_invoice_no AS vendor_invoice_number,
invoices__t.invoice_date,
invoice_lines__t.id AS invoice_line_id,
invoice_lines__t.total AS invoice_line_total,
invoices__t.currency AS invoice_currency,
invoice_lines_fund_distribution.invoice_line_distribution_value,
invoice_lines_fund_distribution.invoice_line_distribution_type,
invoice_lines_fund_distribution.invoice_line_fund_id,
inv_line_fund.code AS invoice_line_fund_code,
inv_line_expense_class.id AS invoice_line_expense_class_id,
inv_line_expense_class.code AS invoice_line_expense_class_code,
inv_line_expense_class.name AS invoice_line_expense_class_name,
inv_line_expense_class.external_account_number_ext,
invoices__t.exchange_rate,
transactions.transaction_id,
transactions.transaction_amount,
transactions.transaction_currency,
finance.fund_id AS transaction_fund_id,
finance.fund_code AS transaction_fund_code,
finance.fiscal_year_id,
finance.fiscal_year,
finance.ledger_id,
finance.ledger_name,
finance.budget_id,
finance.budget_name,
transactions.transaction_expense_class_id,
transactions.expense_class_code AS transactions_expense_class_code,
transactions.expense_class_name AS transactions_expense_class_name,
transactions.external_account_number_ext AS transactions_external_account_number_ext
FROM
folio_invoice.invoices__t
LEFT JOIN folio_invoice.invoice_lines__t ON invoice_lines__t.invoice_id = invoices__t.id
LEFT JOIN invoice_lines_fund_distribution ON invoice_lines_fund_distribution.invoice_line_id = invoice_lines__t.id
LEFT JOIN folio_finance.expense_class__t AS inv_line_expense_class ON inv_line_expense_class.id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
LEFT JOIN folio_finance.fund__t AS inv_line_fund ON inv_line_fund.id = invoice_lines_fund_distribution.invoice_line_fund_id
LEFT JOIN invoice_vendors ON invoice_vendors.invoice_id = invoices__t.id
LEFT JOIN transactions ON transactions.source_invoice_line_id = invoice_lines__t.id
AND transactions.from_fund_id = invoice_lines_fund_distribution.invoice_line_fund_id
LEFT JOIN finance ON finance.fund_id = transactions.from_fund_id
AND finance.fiscal_year_id = transactions.fiscal_year_id
WHERE
invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id IS NULL
UNION
SELECT
invoices__t.id AS invoice_id,
invoices__t.status AS invoice_status,
invoices__t.folio_invoice_no AS folio_invoice_number,
invoices__t.vendor_id,
invoice_vendors.invoice_vendor_name AS vendor_name,
invoices__t.vendor_invoice_no AS vendor_invoice_number,
invoices__t.invoice_date,
invoice_lines__t.id AS invoice_line_id,
invoice_lines__t.total AS invoice_line_total,
invoices__t.currency AS invoice_currency,
invoice_lines_fund_distribution.invoice_line_distribution_value,
invoice_lines_fund_distribution.invoice_line_distribution_type,
invoice_lines_fund_distribution.invoice_line_fund_id,
inv_line_fund.code AS invoice_line_fund_code,
inv_line_expense_class.id AS invoice_line_expense_class_id,
inv_line_expense_class.code AS invoice_line_expense_class_code,
inv_line_expense_class.name AS invoice_line_expense_class_name,
inv_line_expense_class.external_account_number_ext,
invoices__t.exchange_rate,
transactions.transaction_id,
transactions.transaction_amount,
transactions.transaction_currency,
finance.fund_id AS transaction_fund_id,
finance.fund_code AS transaction_fund_code,
finance.fiscal_year_id,
finance.fiscal_year,
finance.ledger_id,
finance.ledger_name,
finance.budget_id,
finance.budget_name,
transactions.transaction_expense_class_id,
transactions.expense_class_code AS transactions_expense_class_code,
transactions.expense_class_name AS transactions_expense_class_name,
transactions.external_account_number_ext AS transactions_external_account_number_ext
FROM
folio_invoice.invoices__t
LEFT JOIN folio_invoice.invoice_lines__t ON invoice_lines__t.invoice_id = invoices__t.id
LEFT JOIN invoice_lines_fund_distribution ON invoice_lines_fund_distribution.invoice_line_id = invoice_lines__t.id
LEFT JOIN folio_finance.expense_class__t AS inv_line_expense_class ON inv_line_expense_class.id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
LEFT JOIN folio_finance.fund__t AS inv_line_fund ON inv_line_fund.id = invoice_lines_fund_distribution.invoice_line_fund_id
LEFT JOIN invoice_vendors ON invoice_vendors.invoice_id = invoices__t.id
LEFT JOIN transactions ON transactions.source_invoice_line_id = invoice_lines__t.id
AND transactions.from_fund_id = invoice_lines_fund_distribution.invoice_line_fund_id
AND transactions.transaction_expense_class_id = invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id
LEFT JOIN finance ON finance.fund_id = transactions.from_fund_id
AND finance.fiscal_year_id = transactions.fiscal_year_id
WHERE
invoice_lines_fund_distribution.invoice_line_distribution_expense_class_id IS NOT NULL
As you can see, the CTEs are also used here so that the data from the respective CTE can be used multiple times.
[top]
7. 0 vs NULL
A NULL
should not be confused with a value of 0
. A NULL
indicates a lack of a value, which is not the same thing as a zero value.
For example, consider the question "How many books does Adam own?" The answer may be 0
(we know that he owns none) or NULL
(we do not know how many he owns). Source: https://en.wikipedia.org/wiki/Null_(SQL)
For calculations or comparisons, it may be necessary to assign a default value to NULL values. Postgres offers the predefined function COALESCE()
for this. The function takes two arguments: 1. The value that needs to be checked if it is NULL
and 2. The value that should replace the NULL value
.
Example:
SELECT
jsonb_extract_path_text(loan.jsonb, 'itemId') :: UUID AS item_id,
COALESCE(COUNT(id), 0) AS loans
FROM
folio_circulation.loan
GROUP BY
item_id
In this example, we replace NULL values with 0 in the column loans. If an item has never been borrowed, it is assigned the value 0. Otherwise there would be a NULL value there. This is not good if NULL values occur during calculations and comparisons.
[top]
8. Non-Roman character
Most databases nowadays store data in UTF format. In this case you can use these symbols directly in your query.
Example:
SELECT
sa_name AS 中國條約
FROM
folio_agreements.subscription_agreement sa
WHERE
sa_name = '中國條約'
In order to be able to save this query, the file must be saved in UTF format, otherwise the symbols will not be encoded correctly.
[top]
8.1. Search and replace symbols
You can also search and replace symbols. The trick is to convert the symbols into a uniform system and thus enable the basis for a better quering. Here it is advisable to convert the symbols into a HEX value. Each symbol has a HEX value. PSQL offers functions for this. PSQL offers the simplest option with the data type bytea.
Example:
sa_name::bytea
If you look at attribute value in DBeaver you will get the view in a hexviewer.
Sample output:
E4 B8 AD E5 9C 8B E6 A2 9D E7 B4 84 20 5B 53 74 65 66 61 6E 5D
Search symbols
Example:
Find all agreements that have non-Roman characters in their names
SELECT
*
FROM
(
SELECT
sa_name,
/*
* 1. delete all latin script symbols with the functions translate() and replace()
* 2. look at the HEX value at the beginning. If they start with a letter, then you have a title with non-latin symbols
*/
length(REPLACE(REPLACE(REPLACE(regexp_matches(encode(REPLACE(TRANSLATE(sa_name, 'abcdefghijklmnopqrstuvwxyzäöü+()- ,#+[]ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜß', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')::bytea, 'hex'), '[a-f]*')::varchar, '{', ''), '}', ''), '"', '')) AS len
FROM
folio_agreements.subscription_agreement sa
) AS t
WHERE
t.len > 0
Replace symbols
Example:
Replace the space with another symbol. Space = 20 , the other symbol = E4B8AD (Tables: https://www.utf8-chartable.de/unicode-utf8-table.pl)
convert_from(decode(encode(concat('\x', replace(encode('A B', 'hex'), '20', 'E4B8AD'))::bytea, 'escape'), 'escape'), 'UTF-8')::varchar
[top]