SQLite mySQL MSSQL Postgres - sgml/signature GitHub Wiki

Keywords

Canonical Sources

pgpedia.info

postgis.net

postgresql.org

Vector Search

SQLite Adapter

Component Role in Adapter Pattern Loop-Free Benefit
SQLite DB Target interface (uniform queryable structure) Declarative access replaces imperative iteration
SQL Queries Adapted operations (SELECT, JOIN, etc.) Express logic without loops; engine handles traversal
Language Bindings Adapters (Python, Go, Rust, etc.) Abstracts away iteration; exposes query results as native types
Schema Design Interface contract Enforces structure; avoids ad hoc traversal
Views & Indexes Optimization overlays Precomputed logic paths; avoids recomputation via loops

UUIDs

Unicode and ASCII

databases_supporting_u_and_char39:
  postgresql:
    supports_U&: true
    supports_CHAR39: true
    notes: "Fully supports Unicode escapes (U&'It\u0027s a great day!') and ASCII escapes (CHAR(39))."

  ibm_db2:
    supports_U&: true
    supports_CHAR39: true
    notes: "Supports both U& Unicode escapes and CHAR(39) for single quote handling."

  teradata:
    supports_U&: true
    supports_CHAR39: true
    notes: "Provides native Unicode shorthand notation and ASCII character escapes."

Fuzzy Matching

Date Diffing

SQLite

Static Site Search

  • Lunr.js: "Simple & Fast Client-Side Search"
  • Jekyll Search: "Static Site Search for Jekyll"
  • Hugo-Lunr: "Prebuilt Search for Hugo Sites"

Embedded Search in Apps

  • SQLite Full-Text Search (FTS5): "Local Search Engine for Mobile Apps"
  • MeiliSearch: "High-Speed Search API for Apps"
  • Typesense: "Lightning-Fast Search for Mobile & Web"

Postgres

MTLS

Release Notes

Extensions

Syntax

Design

Maintenance

PostgREST

Approach:
  1. Using PostgREST:
     - Converts PostgreSQL into a RESTful API.
     - Automatically generates endpoints based on database schema.
     - Supports authentication, role-based access, and query filtering.
     - Example: `postgrest postgres://user:password@host:port/dbname`

  2. Direct SQL Execution via HTTP:
     - Use PostgreSQL’s built-in `pg_http` extension to handle HTTP requests.
     - Allows executing SQL queries directly via HTTP.
     - Example: `SELECT * FROM pg_http_get('https://api.example.com/data');`

  3. JSONB-Based API:
     - Store structured JSON data in PostgreSQL using `JSONB`.
     - Query JSON data efficiently with `jsonb_path_query()`.
     - Example: `SELECT jsonb_path_query(data, '$.users[*].name') FROM api_store;`

  4. PL/pgSQL Functions as API Handlers:
     - Define stored procedures that act as API endpoints.
     - Example: `CREATE FUNCTION get_user(id INT) RETURNS JSON AS $$ SELECT row_to_json(u) FROM users u WHERE u.id = id $$ LANGUAGE SQL;`

  5. WebSockets via PostgreSQL LISTEN/NOTIFY:
     - Use PostgreSQL events pub/sub system for real-time updates.
     - Example: `LISTEN new_event; NOTIFY new_event, 'New data available';`

Pros/Cons

PostgresML

Self-Joins

Functions

JSON/JSONB/HBase/XML

Reliability

Password Encryption

testgres

Backup Verification

Psql

pgModeler

Migration and Restoring

pg_dump / pg_restore

Change Data Capture

  • Manually replicate schema: Use SCT, pg_dump --schema-only, or equivalent.
  • Copy procedural code (if needed): Functions, triggers, SPs via separate scripting.
  • Use DMS to migrate data: Configure endpoints, replication task, mapping rules.

Table of Contents Creation, Editing, and Reuse

import re
import subprocess

class PgRestoreManager:
    def __init__(self, dump_file, database, regex_pattern):
        self.dump_file = dump_file
        self.database = database
        self.regex_pattern = re.compile(regex_pattern, re.IGNORECASE)
        self.toc_file = "toc.txt"
        self.filtered_toc_file = "filtered_toc.txt"

    def generate_toc(self):
        """Creates a table of contents from the dump file."""
        subprocess.run(["pg_restore", "-l", self.dump_file], stdout=open(self.toc_file, "w"))

    def filter_toc(self):
        """Edits the TOC file, removing lines matching the regex pattern."""
        with open(self.toc_file, "r") as infile, open(self.filtered_toc_file, "w") as outfile:
            for line in infile:
                if not self.regex_pattern.search(line):
                    outfile.write(line)

    def restore_filtered(self):
        """Restores only the remaining tables from the modified TOC file."""
        subprocess.run(["pg_restore", "-L", self.filtered_toc_file, "-d", self.database, self.dump_file])

    def execute(self):
        """Runs all steps sequentially."""
        self.generate_toc()
        self.filter_toc()
        self.restore_filtered()

# Example Usage
if __name__ == "__main__":
    manager = PgRestoreManager(
        dump_file="my_dump_file.dump",
        database="my_database",
        regex_pattern=r"unwanted_table_name"
    )
    manager.execute()

Decoupling from RDS

pg_dump_exclusion:
  methods:
    - name: "Exclude Event Triggers"
      command: "pg_dump --no-event-triggers -U postgres -d mydb > dump.sql"
      description: "Prevents event triggers from being included in the dump."
      postgres_version: "PostgreSQL 17+"
    
    - name: "Exclude RDSAdmin Schema"
      command: "pg_dump --exclude-schema=rdsadmin -U postgres -d mydb > dump.sql"
      description: "Excludes objects within the Amazon RDS-managed 'rdsadmin' schema."
    
    - name: "Filter TOC Entries During Restore"
      command: "pg_restore -L <(pg_restore -l dump.sql | grep -v 'EVENT TRIGGER' | grep -v 'rdsadmin') -U postgres -d newdb"
      description: "Filters out event triggers and 'rdsadmin' schema references during restoration."
    
  additional_resources:
    - name: "PostgreSQL pg_dump Documentation"
      url: "https://www.postgresql.org/docs/current/app-pgdump.html"

Best Practices

Safe Subsetting Strategies:
  1. Use `pg_dump` with selective table exports:
     - `pg_dump -d originaldb -t table1 -t table2 -f subset.dump`
     - Dumps only specified tables instead of the full database.

  2. Restore only selected tables:
     - `pg_restore -d newdb --data-only subset.dump`
     - Ensures only data is restored without modifying schema.

  3. Use `pgsubset` for controlled subsetting:
     - A tool designed for making a consistent copy of a subset of a database.
     - Supports row filtering and data transformation to redact sensitive data.
     - More details [here](https://github.com/jackc/pgsubset).

  4. Create a new database using a template:
     - `createdb newdb -T originaldb`
     - Copies the structure but requires ensuring the original database is idle.

  5. Use `COPY` for manual extraction:
     - `COPY table1 TO 'table1.csv' WITH CSV HEADER;`
     - Allows exporting specific tables and re-importing them selectively.

  6. Apply row-level filtering:
     - `pg_dump -d originaldb --where="created_at > '2024-01-01'" -f filtered.dump`
     - Ensures only relevant rows are included in the subset.

Process

migration_workflow:
  - step: "Perform Initial Migration"
    actions:
      - name: "Export Schema and Data"
        tool: "pg_dump"
        command: "pg_dump -h source-db.example.com -U source_user -d source_db > dump.sql"
        description: "Use pg_dump to export schema and data from the source database."
      - name: "Import Schema and Data"
        tool: "pg_restore"
        command: "pg_restore -h target-db.example.com -U target_user -d target_db < dump.sql"
        description: "Use pg_restore to import the dump into the target database."

  - step: "Set Up AWS DMS for Continuous Replication"
    actions:
      - name: "Create Replication Instance"
        tool: "AWS DMS"
        api_call: "create_replication_instance"
        description: "Set up a replication instance using AWS DMS."
      - name: "Create Source Endpoint"
        tool: "AWS DMS"
        api_call: "create_endpoint"
        description: "Define the source database endpoint in AWS DMS."
      - name: "Create Target Endpoint"
        tool: "AWS DMS"
        api_call: "create_endpoint"
        description: "Define the target database endpoint in AWS DMS."
      - name: "Configure Replication Task"
        tool: "AWS DMS"
        api_call: "create_replication_task"
        description: "Set up a replication task in AWS DMS for Change Data Capture (CDC)."
      - name: "Start Replication Task"
        tool: "AWS DMS"
        api_call: "start_replication_task"
        description: "Start the DMS task to replicate changes continuously."

  - step: "Monitor and Validate Migration"
    actions:
      - name: "Monitor Task Progress"
        tool: "AWS DMS"
        api_call: "describe_replication_tasks"
        description: "Use AWS DMS to monitor the status of the migration task."
      - name: "Validate Target Database"
        tool: "Manual Testing"
        description: "Ensure data and schema consistency in the target database."

Host-Based Authentication (hba)

postgresql.conf

PostGIS

GUIs

RegExp

# UUID4 RegExp: wrap with paren on the outside to deferences the first index. Indices start at one

(REGEXP_MATCH(foo, '[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}'))[1] 

Copying tables from one database to another

import os

# Define the table name to be copied and the source/target databases
table_name = 'your_table_name'
source_db = 'foo'
target_db = 'bar'

# Define the SQL commands
sql_commands = f"""
\\c {source_db}
COPY {table_name} TO STDOUT WITH CSV HEADER;

\\c {target_db}
COPY {table_name} FROM STDIN WITH CSV HEADER;
"""

# Write the SQL commands to a file
with open('pgplsql.sql', 'w') as file:
    file.write(sql_commands)

# Execute the SQL file as a separate process
os.system('psql -f pgplsql.sql')

Splitting DataBase Tables

steps:
  - step: Create a Partitioned Table
    details: >
      First, create a partitioned table that will be split by month. 
      You can use **range partitioning** based on a date column.
    code: |
      CREATE TABLE sales (
          id SERIAL PRIMARY KEY,
          sale_date DATE NOT NULL,
          amount NUMERIC(10, 2) NOT NULL
      ) PARTITION BY RANGE (sale_date);
  - step: Create Partitions for Each Month
    details: >
      Next, create partitions for each month. You can automate this process using a function.
    code: |
      CREATE OR REPLACE FUNCTION create_monthly_partitions() RETURNS VOID AS $$
      DECLARE
          month_start DATE;
          month_end DATE;
      BEGIN
          month_start := DATE_TRUNC('month', CURRENT_DATE);
          month_end := month_start + INTERVAL '1 MONTH' - INTERVAL '1 DAY';

          EXECUTE 'CREATE TABLE sales_' || TO_CHAR(month_start, 'YYYY_MM') || ' PARTITION OF sales FOR VALUES FROM (''' || month_start || ''') TO (''' || month_end || ''')';
      END;
      $$ LANGUAGE plpgsql;
  - step: Schedule the Function to Run Monthly
    details: >
      You can use a **cron job** or a **scheduler** like pg_cron to run this function at the start of each month.
  - step: Create a New Database at the Start of Each Month
    details: >
      To create a new database at the start of each month, you can use a similar approach with a function and a trigger.
    code: |
      CREATE OR REPLACE FUNCTION create_new_database() RETURNS VOID AS $$
      BEGIN
          EXECUTE 'CREATE DATABASE db_' || TO_CHAR(CURRENT_DATE, 'YYYY_MM');
      END;
      $$ LANGUAGE plpgsql;

      CREATE OR REPLACE FUNCTION create_database_trigger() RETURNS TRIGGER AS $$
      BEGIN
          IF (NEW.sale_date IS NOT NULL) THEN
              PERFORM create_new_database();
          END IF;
          RETURN NEW;
      END;
      $$ LANGUAGE plpgsql;

      CREATE TRIGGER create_database_trigger
      AFTER INSERT ON sales
      FOR EACH ROW EXECUTE FUNCTION create_database_trigger();
  - step: Schedule the Trigger Function
    details: >
      You can also schedule the trigger function to run at the start of each month using a scheduler.
  - summary: >
      By following these steps, you can automate the process of splitting your table into monthly partitions and creating a new database at the start of each month. This approach ensures that your data is organized efficiently and that new databases are created regularly.

ANSI SQL

Triggers

Pros and Cons

validation_methods:
  postgres_trigger:
    description: "Enforces JSON schema validation via a PostgreSQL trigger function before INSERT/UPDATE operations at the database level."
    pros:
      - "Centralized Data Integrity: All data, regardless of the source, is uniformly validated."
      - "Last Line of Defense: Prevents invalid data from ever being stored in the database."
      - "Uniform Enforcement: Every application or interface writing to the database is subject to the same validation rules."
    cons:
      - "Maintenance Overhead: Changing validation rules requires modifying database trigger code, which can be less accessible."
      - "Limited Flexibility: Hard to implement complex, context-dependent business logic directly in SQL/PLpgSQL."
      - "User Experience: Errors raised by triggers often result in technical error messages that are not user-friendly."
      - "Debugging Complexity: Triggers can obscure the source of errors, making troubleshooting more challenging."

  middleware_pydantic:
    description: "Uses middleware (e.g., with Pydantic) to validate data before it reaches the database."
    pros:
      - "Developer-Friendly: Provides clear, human-readable error messages and leverages type annotations."
      - "Early Error Catching: Data is validated before any database transaction, reducing unnecessary database operations."
      - "Flexibility and Context: Allows for dynamic, complex validation logic based on context and business rules."
      - "Integration with Business Logic: Validation is integrated with other application processes in one place."
    cons:
      - "Not Foolproof: If data bypasses the middleware (e.g., via direct database access), invalid data might get stored."
      - "Duplication Risk: Maintaining synchronization between middleware and database validations can be challenging."
      - "Single Point of Entry: Only effective if every write operation is funneled through the middleware."

  client_side_vue:
    description: "Performs validation on the client side using a Vue mixin to provide real-time feedback and improve the user experience."
    pros:
      - "Immediate Feedback: Users receive direct, on-the-fly notifications for errors, which enhances usability."
      - "Reduced Server Load: Filters out obvious invalid data before sending requests to the server."
      - "Seamless Integration: Easily tied into UI components to enforce aesthetic and interaction standards."
    cons:
      - "Security Limitations: Client-side code can be bypassed or manipulated; it should never be the sole layer of validation."
      - "Duplication of Logic: Maintaining consistent validation rules across client, server, and database can be complex."
      - "Inconsistency Risks: It is challenging to ensure that client-side validations perfectly mirror backend logic."
      - "Limited Scope: Focuses on user experience rather than acting as a robust safeguard for data integrity."

Summary:

  • A hybrid approach is often recommended where client-side
  • validation improves usability, middleware like Pydantic ensures clarity
  • in business logic and early error detection, and database triggers
  • act as the ultimate fail-safe for data integrity.

Comparison

Caveats

The return value of a row-level trigger fired AFTER or a statement-level trigger fired BEFORE or AFTER is always ignored; it might as well be null. However, any of these types of triggers might still abort the entire operation by raising an error.

This example trigger ensures that any time a row is inserted or updated in the table, the current user name and time are stamped into the row. And it checks that an employee's name is given and that the salary is a positive value.

CREATE TABLE emp (
    empname           text,
    salary            integer,
    last_date         timestamp,
    last_user         text
);

CREATE FUNCTION emp_stamp() RETURNS trigger AS $emp_stamp$
    BEGIN
        -- Check that empname and salary are given
        IF NEW.empname IS NULL THEN
            RAISE EXCEPTION 'empname cannot be null';
        END IF;
        IF NEW.salary IS NULL THEN
            RAISE EXCEPTION '% cannot have null salary', NEW.empname;
        END IF;

        -- Who works for us when they must pay for it?
        IF NEW.salary < 0 THEN
            RAISE EXCEPTION '% cannot have a negative salary', NEW.empname;
        END IF;

        -- Remember who changed the payroll when
        NEW.last_date := current_timestamp;
        NEW.last_user := current_user;
        RETURN NEW;
    END;
$emp_stamp$ LANGUAGE plpgsql;

CREATE TRIGGER emp_stamp BEFORE INSERT OR UPDATE ON emp
    FOR EACH ROW EXECUTE FUNCTION emp_stamp();

Statement-level BEFORE triggers naturally fire before the statement starts to do anything, while statement-level AFTER triggers fire at the very end of the statement. These types of triggers may be defined on tables, views, or foreign tables. Row-level BEFORE triggers fire immediately before a particular row is operated on, while row-level AFTER triggers fire at the end of the statement (but before any statement-level AFTER triggers). These types of triggers may only be defined on tables and foreign tables, not views. INSTEAD OF triggers may only be defined on views, and only at row level; they fire immediately as each row in the view is identified as needing to be operated on.

The execution of an AFTER trigger can be deferred to the end of the transaction, rather than the end of the statement, if it was defined as a constraint trigger. In all cases, a trigger is executed as part of the same transaction as the statement that triggered it, so if either the statement or the trigger causes an error, the effects of both will be rolled back.

Trigger Code Generation

trigger_abstraction_strategy:
  goal: "Cross-database trigger logic using SQL DSL + Adapter Pattern"
  components:
    - TriggerDSL:
        description: >
          A declarative interface to define trigger behavior in a neutral format.
        example:
          on_event: INSERT
          table: users
          condition: "NEW.email IS NOT NULL"
          action:
            - log: "User created with email"
            - call_function: "notify_admin"

    - AdapterPattern:
        description: >
          Each RDBMS (PostgreSQL, MySQL, SQL Server) has its own adapter that translates the DSL into native SQL.
        adapters:
          - PostgresAdapter:
              generates: PL/pgSQL trigger + function
          - MySQLAdapter:
              generates: SQL trigger with BEGIN...END block
          - SQLServerAdapter:
              generates: T-SQL trigger with inserted/deleted pseudo-tables

    - DSL_Engine:
        description: >
          Core engine that parses DSL and delegates to the appropriate adapter based on target DB.
        features:
          - Syntax validation
          - Adapter dispatch
          - Optional dry-run for previewing SQL

  benefits:
    - Write-once trigger logic
    - Centralized business rules
    - Easier testing and CI/CD integration

  tradeoffs:
    - Requires custom tooling or framework
    - Limited to common trigger features across RDBMS
    - May not support advanced DB-specific features (e.g., INSTEAD OF triggers)

  real_world_tools:
    - jOOQ (Java DSL for SQL, can be extended for triggers)
    - Knex.js (JS query builder with plugin potential)
    - SQLAlchemy (Python ORM with event hooks)
    - Prisma (JS/TS ORM with middleware, not native triggers)

Lock Checking Table

-- PostgreSQL - Check for blocking queries
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Permission Checking Table

Trigger

CREATE TRIGGER authorize_access
BEFORE INSERT OR UPDATE ON sensitive_table
FOR EACH ROW
BEGIN
    IF NOT (SELECT user_id FROM user_permissions WHERE user_id = NEW.user_id AND object_id = NEW.object_id) THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Unauthorized access';
    END IF;
END;

Procedure

CREATE PROCEDURE authorize_access(IN user_id INT, IN object_id INT)
BEGIN
    IF NOT (SELECT user_id FROM user_permissions WHERE user_id = user_id AND object_id = object_id) THEN
        SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Unauthorized access';
    END IF;
END;

SQLAlchemy

Postgres

Postgres Extensions

Profiling

import subprocess
import sys

def run_psql_profiler(pgbouncer_instance):
    command = [
        'psql',
        '-h', pgbouncer_instance['host'],
        '-p', str(pgbouncer_instance['port']),
        '-U', pgbouncer_instance['user'],
        '-d', 'pgbouncer',
        '-c', 'SHOW POOLS;'
    ]

    result = subprocess.run(command, capture_output=True, text=True, env={"PGPASSWORD": pgbouncer_instance['password']})

    if result.returncode == 0:
        print("PSQL Profiler Output:")
        print(result.stdout)
    else:
        print("Error running PSQL Profiler:")
        print(result.stderr)

if __name__ == "__main__":
    if len(sys.argv) != 5:
        print("Usage: python script.py <host> <port> <user> <password>")
        sys.exit(1)

    pgbouncer_instance = {
        'host': sys.argv[1],
        'port': int(sys.argv[2]),
        'user': sys.argv[3],
        'password': sys.argv[4]
    }

    run_psql_profiler(pgbouncer_instance)

PGLoader

PGBouncer

A Tale of Two PgBouncers

Once upon a time, PgBouncer lived a quiet life on traditional Linux systems. It had a cozy home in /etc/pgbouncer, where its configuration file (pgbouncer.ini) and its list of trusted users (userlist.txt) sat side by side like old friends. Logs were tucked away in /var/log/pgbouncer, and its heartbeat—the PID file—rested in /var/run/pgbouncer, letting the system know it was alive and well.

System administrators would gently edit its config files by hand, restart it with care, and monitor its logs like tending a garden. Everything was predictable, rooted in the filesystem, and guided by systemd or init scripts.

Then came the age of containers.

In this new world, PgBouncer found itself inside a lightweight box—a Docker container. Its home was no longer fixed; sometimes it lived in /etc/pgbouncer, other times in /app/pgbouncer, depending on who built the box. Instead of handwritten config files, it often received instructions through environment variables like DB_USER, POOL_MODE, or MAX_CLIENT_CONN, whispered to it at startup.

There were no logs in /var/log, no PID file to mark its presence. Instead, it relied on entrypoint scripts and ephemeral mounts. Its configuration might be injected from the outside, like a care package delivered at launch. It did not mind—this new life was fast, portable, and easy to replicate.

But it never forgot its roots.

Summary of the Journey
  • Traditional PgBouncer lives in stable directories, with hand-edited configs and system-level logging.
  • Docker PgBouncer adapts to its container, using environment variables, mounted configs, and minimal footprints.
  • One is like a well-tended garden; the other, a nimble traveler with a backpack full of runtime options.

Updating PgBouncer After Restoring a PostgreSQL Dump

Step Command Description
1. Update Configuration nano /etc/pgbouncer/pgbouncer.ini Modify pgbouncer.ini to point to the new PostgreSQL instance.
2. Refresh User Authentication echo '"myuser" "newpassword"' > /etc/pgbouncer/userlist.txt Ensure authentication matches the new PostgreSQL instance.
3. Reload PgBouncer pgbouncer -R Apply changes without restarting.
4. Restart PgBouncer systemctl restart pgbouncer Ensures new settings take effect.
5. Verify Connection psql -h pgbouncer_host -p 6432 -U myuser -d mydb Confirms PgBouncer is correctly routing connections.

References

PGBouncer Copy/Paste

import psycopg2

# Define the connection parameters
pgbouncer_host = 'localhost'
pgbouncer_port = 6432
pgbouncer_user = 'pgbouncer'
pgbouncer_password = 'your_password'

# Connect to PgBouncer
conn = psycopg2.connect(
    host=pgbouncer_host,
    port=pgbouncer_port,
    user=pgbouncer_user,
    password=pgbouncer_password,
    dbname='pgbouncer'
)

# Create a cursor object
cur = conn.cursor()

# Query the runtime settings
cur.execute("SHOW PGBouncerSettings;")
settings = cur.fetchall()

# Print the settings
for setting in settings:
    print(setting)

# Close the cursor and connection
cur.close()
conn.close()

JSONB

## Aliases
Output aliases cannot be used in the WHERE part. You have two choices: subquery or duplicate the definition.
```
select jsonb_array_elements((ARRAY(select jsonb_array_elements(msg->'root') ele ))[2]::jsonb) filterin
from js
where jsonb_array_elements((ARRAY(select jsonb_array_elements(msg->'root') ele ))[2]::jsonb)->>'cid'='CID1';
```
or
```
SELECT filterin FROM 
(select jsonb_array_elements((ARRAY(select jsonb_array_elements(msg->'root') ele ))[2]::jsonb) filterin
from js) data
WHERE filterin->>'cid'='CID1';
```

Malformed JSONB

JSONB columns used to store metadata can be malformed JSON. Use regexp replacements to parse it. For example, this parses any value that begins with a capital letter, such as "'Foo'" which regexp_match returns as "{'Foo'}"

select uuid, id, (regexp_replace( regexp_match(meta::text, '\u0027[A-Z][^\u0027]+.')::text, '[^A-Za-z\s]+', '','g' ) ) as name
from accounts
where uuid = 'fdfdfa-3r3434-sdfaf-334343'

SQLite

Architecture

Flat File Interop

Restoration

Usage

Funding

Internals

DB2

MySQL

MSSQL

PostgreSQL

Triggers

Abstraction

Implicit/Explicit Optimization

Implementing a Graph Database

Replication

https://kb.promise.com/thread/what-s-the-difference-between-asynchronous-and-synchronous-mirrors/ https://library.netapp.com/ecmdocs/ECMP12404965/html/GUID-B513B031-D524-4E0D-8FB1-3984C9D9FA20.html https://www.brianstorti.com/replication/ http://www.cattell.net/datastores/Datastores.pdf http://www.cs.sjsu.edu/faculty/kim/nosql/contents/presentation/slides/ScalableSQLandNoSQLDataStores.pdf

Pandas

Serverless

Excel

XSLT / XQuery Integration

Write Ahead Log / Change Data Capture

NoSQL / NewSQL

Data Seeding

GitLab Case Study Data

Aspect Description
Use Case Populating GitLab's PostgreSQL database with core data (roles, users, settings) at setup and in CI/CD environments.
Deployment Context Applies to both self-managed installations and GitLab.com cloud deployments; ensures consistency and scalability across environments.
CI/CD Integration Seeded databases are persisted using artifacts; enables reproducible testing and faster pipelines.
Multi-Tenant Strategy During tenant provisioning, baseline data is seeded per namespace, supporting GitLab’s SaaS architecture.
Migration Use Seeding used in post-deployment migrations to safely initialize or backfill new tables and settings.
Scalability Goal Ensures idempotent, environment-agnostic initialization for horizontal scaling and automation.
Implementation Tools Ruby seed files, ActiveRecord (Rails ORM), database tasks via Rake and post-deploy jobs.
Relevant Resources Database Dev Guidelines, Architecture Handbook, CI seeding example

MMA

select
    (regexp_replace(regexp_replace('2021-5-9', '(-)([0-9]$)', '\10\2', 'g'), '(-)([0-9]-)', '\10\2', 'g'))

Write Ahead Log

+----------------+     +-----------------+     +-----------------+
|                |     |                 |     |                 |
|  Transaction   |     |  Write-Ahead    |     |  Database Disk  |
|    (Begin)     | --> |      Log        | --> |     Storage     |
|                |     |                 |     |                 |
+----------------+     +-----------------+     +-----------------+
       |                     |                       |
       |                     |                       |
+----------------+     +-----------------+     +-----------------+
|                |     |                 |     |                 |
|  Transaction   |     |  Write-Ahead    |     |  Database Disk  |
|    (Commit)    | --> |      Log        | --> |     Storage     |
|                |     |                 |     |                 |
+----------------+     +-----------------+     +-----------------+

Process:

  • A transaction is first recorded in memory
  • Any CRUD mutations are written to the Write-Ahead Log (WAL) from memory next
  • Finally, the transaction is committed to the database storage, and the changes are flushed from the WAL

If a crash occurs, all committed transactions can be replayed from the WAL to bring the database back to a consistent state

Comparison

,Teradata,Oracle
Use Case,"Large-scale data warehousing and analytics","Wide range of applications including transaction processing, data warehousing, and enterprise applications"
Architecture,"Massively Parallel Processing (MPP)","Shared-nothing"

Equivalents

PostgreSQL_Tools_to_Oracle_Equivalents:
  - PostgreSQL_Tool: pg_stat
    Oracle_Equivalent: Oracle Enterprise Manager (OEM)
    URL: https://www.oracle.com/enterprise-manager/
  - PostgreSQL_Tool: pg_repack
    Oracle_Equivalent: Oracle Real-Time Database (RTD)
    URL: https://www.oracle.com/database/technologies/
  - PostgreSQL_Tool: Write-ahead Log (WAL)
    Oracle_Equivalent: Redo Logs
    URL: https://docs.oracle.com/en/database/oracle/oracle-database/19/admin/managing-the-redo-log.html
  - PostgreSQL_Tool: psql
    Oracle_Equivalent: SQL*Plus
    URL: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/SQL-Plus-User-s-Guide-and-Reference.html
  - PostgreSQL_Tool: pg_dump
    Oracle_Equivalent: Data Pump Export (expdp)
    URL: https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-data-pump.html
  - PostgreSQL_Tool: pg_restore
    Oracle_Equivalent: Data Pump Import (impdp)
    URL: https://docs.oracle.com/en/database/oracle/oracle-database/19/sutil/oracle-data-pump.html
  - PostgreSQL_Tool: pgbouncer
    Oracle_Equivalent: Oracle Connection Pooling
    URL: https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/data-sources-and-URLs.html

change data capture

Feature,PostgreSQL,Oracle,Teradata,MySQL
WAL Purpose,"Ensures data integrity and durability","Provides atomicity and durability","Ensures data consistency and recovery","Ensures data durability and recovery"
WAL Mechanism,"Records changes before applying them","Logs changes before applying them","Uses a combination of logging and checkpointing","Logs changes before applying them"
Checkpointing,"Periodic checkpoints to flush WAL to disk","Periodic checkpoints to flush WAL to disk","Frequent checkpoints to ensure data consistency","Periodic checkpoints to flush WAL to disk"
Recovery,"Redo and undo operations for crash recovery","Redo and undo operations for crash recovery","Redo operations for crash recovery","Redo operations for crash recovery"
Performance Impact,"Reduced disk writes due to sequential logging","Reduced disk writes due to sequential logging","Optimized for large-scale data warehousing","Reduced disk writes due to sequential logging"
Archiving,"Supports continuous archiving and point-in-time recovery","Supports continuous archiving and point-in-time recovery","Supports data replication and recovery","Supports point-in-time recovery"
Configurability,"Various WAL levels (minimal, replica, logical)","Configurable logging levels and checkpoint intervals","Configurable logging and checkpoint settings","Configurable logging and checkpoint settings"

Database Schema Documentation

SchemaSpyWrapper:
  GUI:
    - Primarily CLI-based with a wrapper for easier interaction
    - Generates static HTML reports with ER diagrams
    - Requires manual configuration for visual enhancements
  Accessibility:
    - Web-based output for easy sharing
    - Limited interactive features compared to full GUI tools
    - Requires technical knowledge for setup and customization
  URLs:
    Download: [SchemaSpy Download](https://github.com/schemaspy/schemaspy/releases)
    API Docs: [SchemaSpy API Docs](https://github.com/schemaspy/schemaspy)
    User Docs: [SchemaSpy Documentation](https://schemaspy.org/)
  License: LGPL-3.0
  First Release Date: 2010
  Latest Release Date: July 21, 2023

DbSchema:
  GUI:
    - Fully graphical interface with drag-and-drop schema editing
    - Interactive ER diagrams with real-time updates
    - Built-in SQL editor and query builder
  Accessibility:
    - Supports virtual foreign keys for better schema visualization
    - Works across Windows, Mac, and Linux
    - Provides offline schema documentation for portability
  URLs:
    Download: [DbSchema Download](https://bing.com/search?q=DbSchema+open+source+license+first+release+date+latest+release+date)
    API Docs: [DbSchema API Docs](https://dbschema.com/blog/release-notes/release-notes9/)
    User Docs: [DbSchema Documentation](https://github.com/gabfl/dbschema/blob/main/LICENSE)
  License: MIT
  First Release Date: 2017
  Latest Release Date: April 28, 2025

Postgres Administration

                 +------------------------------+
                 | PostgreSQL Administrator (Superuser) |
                 +------------------------------+
                           |
                 +-----------------+
                 | PostgreSQL Server Instance |
                 +-----------------+
                           |
      +------------+------------+------------+
      |            |            |            |
+------------+ +------------+ +------------+
| Database A | | Database B | | Database C |
+------------+ +------------+ +------------+
      |            |            |
+------------+ +------------+ +------------+
|   Schemas  | |   Schemas  | |   Schemas  |
+------------+ +------------+ +------------+
      |            |            |
+------------+ +------------+ +------------+
|   Tables   | |   Tables   | |   Tables   |
+------------+ +------------+ +------------+
      |            |            |
+------------+ +------------+ +------------+
|   Columns  | |   Columns  | |   Columns  |
+------------+ +------------+ +------------+
      |            |            |
+------------+ +------------+ +------------+
|   Tuples   | |   Tuples   | |   Tuples   |
+------------+ +------------+ +------------+

Additional Elements:
  - Roles: Manage user permissions across databases.
  - System Catalogs: Store metadata for each database.
  - Extensions: Enhance database functionality.
  - Functions: Define computations and triggers.
  - Indexes: Optimize query performance.

  - **Role Definitions**:
    - Default Roles:
      - `pg_read_all_data`: Can read all database contents.
      - `pg_write_all_data`: Can write to all tables and schemas.
      - `pg_monitor`: Grants monitoring capabilities.
      - `pg_backup`: Provides backup-related privileges.
      - `pg_admin`: Allows administrative actions without being superuser.
  
  - **System Catalog Definitions**:
    - `pg_database`: Stores information about databases.
    - `pg_namespace`: Manages schemas.
    - `pg_tables`: Lists tables within a database.
    - `pg_roles`: Defines user roles and permissions.
    - `pg_indexes`: Stores index metadata.
    - `pg_proc`: Keeps function definitions.
    - `pg_attribute`: Holds column metadata.
    - `pg_stat_activity`: Monitors active connections.

Database Client Session Information

What It Shows PostgreSQL MySQL / MariaDB SQL Server Oracle
IP Address inet_client_addr() HOST() or processlist CONNECTIONPROPERTY('client_net_address') SYS_CONTEXT('USERENV', 'IP_ADDRESS')
Computer Name inet_client_hostname() HOST() HOST_NAME() SYS_CONTEXT('USERENV', 'HOST')
Process ID(s) pg_backend_pid() or pg_stat_activity.pid Id from SHOW PROCESSLIST @@SPID or session_id from system views V$SESSION.process or V$PROCESS.spid
Session Info SET / SHOW / current_setting() SET SESSION SESSION_CONTEXT() DBMS_SESSION.SET_CONTEXT
Length of Session now() - backend_start from pg_stat_activity TIME column from processlist DATEDIFF(MINUTE, login_time, GETDATE()) SYSDATE - LOGON_TIME from V$SESSION

Idempotency

psql

Idempotent and Non-Idempotent Parts of psql

Idempotent Parts
  • Schema Creation (CREATE TABLE IF NOT EXISTS, CREATE SCHEMA IF NOT EXISTS)

    • Ensures tables and schemas exist without duplication.
  • Index Creation (CREATE INDEX IF NOT EXISTS)

    • Prevents duplicate index errors when re-executed.
  • Constraints (ALTER TABLE ADD CONSTRAINT IF NOT EXISTS)

    • Avoids redundant constraint creation.
  • Role Management (CREATE ROLE IF NOT EXISTS)

    • Ensures roles exist without overwriting existing ones.
  • Privileges (GRANT, REVOKE)

    • Permissions remain consistent across multiple executions.
Non-Idempotent Parts
  • Data Manipulation (INSERT INTO, COPY)

    • Running multiple times without checks may cause duplicates.
    • Use ON CONFLICT DO NOTHING to prevent unintended inserts.
  • Sequence Updates (ALTER SEQUENCE RESTART)

    • Can reset sequences incorrectly when re-executed.
  • Object Deletion (DROP TABLE, DROP SCHEMA)

    • Using DROP without IF EXISTS makes it non-idempotent.

pgrestore

Idempotent and Non-Idempotent Parts of pg_restore

Idempotent Parts
  • Schema Creation (CREATE TABLE, CREATE SCHEMA)

    • Safe to reapply; existing structures remain unchanged.
    • Use --if-exists to avoid errors.
  • Index Creation (CREATE INDEX)

    • Avoids duplication if the index already exists.
    • Ensure unique index names.
  • Constraints (ALTER TABLE ADD CONSTRAINT)

    • Foreign keys and other constraints do not duplicate when reapplied.
  • Ownership and Privileges (GRANT, REVOKE)

    • Permissions remain consistent upon multiple executions.
    • Use --no-owner to prevent conflicts.
Non-Idempotent Parts
  • Data Restoration (INSERT INTO, COPY)

    • Reapplying pg_restore without cleanup may cause duplicates.
    • Use --clean cautiously.
  • Sequence Values (ALTER SEQUENCE RESTART)

    • Can reset sequences incorrectly.
    • Use --disable-triggers when restoring dependent sequences.
  • Object Recreation (DROP TABLE, DROP SCHEMA)

    • Using --clean makes these commands non-idempotent.
    • Use --if-exists to prevent errors.

Backup File Formats

Comparison of Backup Formats in pg_verifybackup

Format Created By Characteristics Compatibility with pg_verifybackup Compatibility with pg_restore Compatibility with psql
Dump File (.dump) pg_dump -Fc Custom-format archive of structured database objects and data ❌ Not directly verifiable ✅ Can be restored using pg_restore ❌ Cannot use psql; requires conversion to SQL format
Text File (.sql) pg_dump -Fp Human-readable SQL containing CREATE TABLE, INSERT statements ❌ Not directly verifiable ❌ Cannot use pg_restore; must be loaded with psql ✅ Can be executed with psql -f backup.sql
Tar File (.tar) pg_basebackup --format=t Contains raw database files, including WAL logs ✅ Must be extracted before verification ❌ Cannot use pg_restore; requires direct file restoration ❌ Cannot use psql; raw database files must be handled manually
Base Backup (Raw Files) pg_basebackup Full database backup, including WAL and configuration files ✅ Fully supported by pg_verifybackup ❌ Cannot use pg_restore; must be restored manually ❌ Cannot use psql; files must be copied into the PostgreSQL data directory

Key Points

  • Base backups (pg_basebackup) are fully compatible with pg_verifybackup.
  • Dump files (.dump) require pg_restore for structured restoration.
  • SQL backups (.sql) can be directly loaded with psql -f.
  • Tar backups (.tar) and raw base backups require manual restoration.
  • Custom backups for granular restoration. Get the table of contents using -l. Edit the table of contents. Restore the edited table list using -L ( https://dba.stackexchange.com/questions/90574/pg-restore-certain-tables-only)

Programmatic Login

comparison:
  method_1: "Using pyperclip to store password"
  description_1: "Transfers the password via clipboard for execution."
  steps_1:
    - "Store the password in clipboard using `pyperclip.copy(password)`."
    - "Use Python's subprocess module to call `psql`, retrieving the password from clipboard."
  pros_1:
    - "Avoids storing the password in environment variables."
    - "Quick method for one-off authentication."
  cons_1:
    - "Security risk: clipboard contents can be accessed by other applications."
    - "Not suitable for automated scripts."
  
  method_2: "Using PGPASSWORD environment variable"
  description_2: "Sets the database password as an environment variable."
  steps_2:
    - "Define `PGPASSWORD=password` in Python’s subprocess call."
    - "Execute `psql` without prompting for credentials."
  pros_2:
    - "More secure than clipboard storage."
    - "Ideal for automation and scripting."
  cons_2:
    - "Exposes password temporarily in process environment."
    - "Not recommended for production use; better alternatives include `.pgpass` files or `PGSSLMODE` settings."

Depth vs Breadth

Comprehensive List of SQL-Focused Technical Roles

General SQL Roles

  • Database Engineer

    • Designs and maintains SQL-based database schemas
    • Implements indexing, partitioning, and replication strategies
    • Optimizes SQL queries for efficient performance
  • SQL Developer

    • Writes complex SQL queries, stored procedures, and triggers
    • Develops database logic for applications using SQL
    • Enhances transactional efficiency through SQL optimization
  • Data Engineer (SQL-Focused)

    • Builds ETL/ELT pipelines using SQL transformations
    • Works with SQL-based data warehouses and lakehouses
    • Implements SQL-driven data integrations and migrations
  • Database Administrator (DBA)

    • Manages SQL server infrastructure, security, and performance tuning
    • Writes SQL scripts for automation and database maintenance
    • Ensures SQL database integrity, backups, and recovery strategies

Commercial RDBMS-Specific Roles

  • Database Engineer (PostgreSQL, Oracle, SQL Server, etc.)

    • Designs and optimizes database schemas for commercial RDBMS platforms
    • Implements vendor-specific indexing, partitioning, and replication techniques
    • Works on high availability (HA) and disaster recovery (DR) strategies
  • SQL Developer (Oracle PL/SQL, T-SQL, etc.)

    • Develops platform-specific stored procedures, triggers, and functions
    • Optimizes queries using vendor-specific techniques (e.g., Oracle optimizer hints, SQL Server query tuning)
    • Implements transaction management tailored to commercial RDBMS requirements
  • Data Engineer (ETL/ELT with Commercial RDBMS)

    • Builds RDBMS-specific ETL workflows for structured data transformation
    • Works with Oracle SQL Loader, PostgreSQL COPY, SQL Server SSIS, or equivalent tools
    • Manages integrations using proprietary RDBMS extensions (e.g., SQL Server XML functions, Oracle JSON operators)
  • Database Administrator (DBA - PostgreSQL, Oracle, SQL Server, etc.)

    • Manages commercial RDBMS infrastructure, tuning, and licensing constraints
    • Oversees database upgrades, migration strategies, and vendor best practices
    • Ensures compliance with platform-specific security models, such as SQL Server Authentication or Oracle TDE

Cloud-Native RDBMS-Specific Roles

  • Cloud Database Engineer

    • Designs and optimizes SQL schemas for cloud-managed databases
    • Implements cloud-specific scaling strategies, such as auto-scaling storage or read replicas
    • Works with managed database services (AWS RDS, Azure SQL, Cloud SQL)
  • Cloud Data Engineer (SQL-Focused)

    • Builds ETL/ELT pipelines utilizing cloud-based storage (S3, BigQuery, Snowflake)
    • Implements SQL transformations optimized for distributed compute
    • Works with serverless query execution (AWS Athena, Azure Synapse, Google BigQuery SQL)
  • Cloud Database Administrator (Cloud DBA)

    • Manages SQL databases deployed in cloud environments
    • Optimizes query performance using cloud-specific tuning tools (e.g., AWS Performance Insights, Azure Query Store)
    • Implements security controls (IAM, encryption, VPC networking) for cloud databases
  • Cloud SQL Developer

    • Writes cloud-optimized SQL queries, leveraging serverless execution
    • Develops stored procedures tailored for multi-region cloud databases
    • Works with distributed database architectures (e.g., Spanner, YugabyteDB, CockroachDB)

Commercial SQL Products (MIT, Apache, GPL Licensed)

  • PostgreSQL (PostgreSQL License, similar to BSD)

    • Open-source, widely used in commercial applications
    • Supports advanced SQL features, extensions, and replication
  • MySQL (GPL License)

    • Popular open-source RDBMS with commercial support from Oracle
    • Offers transactional and analytical capabilities
  • MariaDB (GPL License)

    • Fork of MySQL with enhanced performance and scalability
    • Used in enterprise applications with commercial support
  • CockroachDB (Apache License 2.0)

    • Distributed SQL database designed for cloud-native applications
    • Provides high availability and horizontal scalability
  • YugabyteDB (Apache License 2.0)

    • Distributed SQL database optimized for transactional workloads
    • Supports PostgreSQL compatibility
  • TiDB (Apache License 2.0)

    • Hybrid transactional/analytical processing (HTAP) database
    • Designed for large-scale distributed SQL workloads
  • SQLite (Public Domain, some GPL-licensed extensions)

    • Lightweight SQL database used in embedded applications
    • Supports ACID transactions with minimal overhead
⚠️ **GitHub.com Fallback** ⚠️