database - dwilson2547/wiki

Databases: A Comprehensive Explanation

A database is an organized collection of structured data stored and accessed electronically. Databases enable efficient data management, including storage, retrieval, updating, and deletion of data. They are foundational to modern applications, from websites and mobile apps to enterprise systems and cloud services.

1. Why Use a Database?
2. Types of Databases
3. Relational Databases (SQL) Deep Dive
4. NoSQL Databases Deep Dive
5. Database Indexes
6. Database Transactions and Locking
7. Database Replication and Sharding
- 7.1. Replication
- 7.2. Sharding
8. Database Security
9. Database Backups and Recovery
10. Choosing a Database
11. Database Connection Pools
12. ORMs (Object-Relational Mappers)
13. Database as a Service (DBaaS)
14. Common Database Operations in Code
15. Database Optimization Techniques
16. Database Trends
17. Common Database Mistakes
18. Example: E-Commerce Database Schema
19. Summary

1. Why Use a Database?

Databases solve critical challenges in data management:

Persistence: Store data permanently (e.g., user accounts, product inventories).
Organization: Structure data for efficient access (e.g., tables, indexes).
Concurrency: Handle multiple users/transactions simultaneously.
Integrity: Enforce rules (e.g., constraints, relationships) to prevent corruption.
Scalability: Grow with application demands (e.g., from 100 to 10 million users).
Security: Control access via permissions, encryption, and auditing.

2. Types of Databases

2.1. Relational Databases (SQL)

Structure: Data stored in tables (rows and columns) with relationships between tables.
Language: SQL (Structured Query Language) for querying and manipulation.
Examples: PostgreSQL, MySQL, SQLite, Microsoft SQL Server, Oracle.
Use Cases: Complex queries, transactions (e.g., banking, e-commerce).
Example Table: | users Table | |--------------|------------|------------------| | id (PK) | name | email | |--------------|------------|------------------| | 1 | Alice | [email protected]| | 2 | Bob | [email protected] |

SQL Query Example:

SELECT * FROM users WHERE name = 'Alice';

2.2. NoSQL Databases

Structure: Non-tabular formats (e.g., key-value, document, column-family, graph).
Language: Varies by database (e.g., MongoDB uses JSON-like queries).
Examples:
- Document: MongoDB, CouchDB (store JSON-like documents).
- Key-Value: Redis, DynamoDB (simple key-value pairs).
- Column-Family: Cassandra, HBase (optimized for large-scale data).
- Graph: Neo4j (for relationships between data points).
Use Cases: High scalability, unstructured data (e.g., social networks, IoT, real-time analytics).

Example (MongoDB Document):

{
  "_id": 1,
  "name": "Alice",
  "email": "[email protected]",
  "address": {
    "city": "New York",
    "zip": "10001"
  }
}

2.3. In-Memory Databases

Structure: Data stored in RAM (not on disk) for ultra-fast access.
Examples: Redis, Memcached.
Use Cases: Caching, session storage, real-time analytics.

2.4. Time-Series Databases

Structure: Optimized for time-stamped data (e.g., metrics, logs).
Examples: InfluxDB, TimescaleDB.
Use Cases: Monitoring, IoT sensor data.

2.5. NewSQL Databases

Structure: Combine SQL’s consistency with NoSQL’s scalability.
Examples: Google Spanner, CockroachDB.
Use Cases: Global-scale applications requiring strong consistency.

3. Relational Databases (SQL) Deep Dive

3.1. Core Concepts

Tables: Collections of related data (e.g., users, orders).
Rows (Records): Individual entries in a table (e.g., a single user).
Columns (Fields): Attributes of a table (e.g., name, email).
Primary Key (PK): Unique identifier for a row (e.g., id).
Foreign Key (FK): Links rows in different tables (e.g., user_id in an orders table).
Indexes: Improve query performance by speeding up searches.
Constraints: Rules to maintain data integrity (e.g., UNIQUE, NOT NULL).

3.2. SQL Basics

Operation	SQL Command	Example
Create Table	`CREATE TABLE`	`CREATE TABLE users (id INT, name TEXT);`
Insert Data	`INSERT INTO`	`INSERT INTO users VALUES (1, 'Alice');`
Select Data	`SELECT`	`SELECT * FROM users;`
Update Data	`UPDATE`	`UPDATE users SET name = 'Bob' WHERE id = 1;`
Delete Data	`DELETE FROM`	`DELETE FROM users WHERE id = 1;`
Join Tables	`JOIN`	`SELECT * FROM users JOIN orders ON users.id = orders.user_id;`
Group Data	`GROUP BY`	`SELECT COUNT(*) FROM users GROUP BY city;`
Filter Data	`WHERE`	`SELECT * FROM users WHERE age > 18;`

3.3. Transactions

ACID Properties: Ensure reliable transactions.
- Atomicity: All operations in a transaction succeed or fail together.
- Consistency: Transactions bring the database from one valid state to another.
- Isolation: Concurrent transactions don’t interfere.
- Durability: Committed transactions persist even after crashes.

Example:

BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

3.4. Normalization

Goal: Reduce redundancy and improve data integrity.
Normal Forms:
- 1NF: No repeating groups (atomic values).
- 2NF: No partial dependencies (all non-key columns depend on the entire primary key).
- 3NF: No transitive dependencies (non-key columns depend only on the primary key).
Example:
- Bad (1NF Violation): | users Table | |--------------|------------------| | id | phones | |--------------|------------------| | 1 | "123-456, 789-012" |
- Good (3NF): | users Table | phones Table | |--------------|----------------|--------------|------------| | id (PK) | name | user_id (FK) | phone | |--------------|----------------|--------------|------------| | 1 | Alice | 1 | 123-456 | | | | 1 | 789-012 |

4. NoSQL Databases Deep Dive

4.1. Key-Value Stores

Structure: Simple key:value pairs.

Example (Redis):

SET user:1 "{'name': 'Alice', 'email': '[email protected]'}"
GET user:1

4.2. Document Databases

Structure: Store JSON-like documents.

Example (MongoDB):

db.users.insertOne({
  name: "Alice",
  email: "[email protected]",
  address: { city: "New York", zip: "10001" }
});
db.users.find({ name: "Alice" });

4.3. Column-Family Stores

Structure: Optimized for large-scale, distributed data (columns grouped into families).

Example (Cassandra):

INSERT INTO users (id, name, email) VALUES (1, 'Alice', '[email protected]');
SELECT * FROM users WHERE id = 1;

4.4. Graph Databases

Structure: Store entities (nodes) and relationships (edges).

Example (Neo4j):

CREATE (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(bob:Person {name: 'Bob'})
MATCH (p:Person)-[:FRIENDS_WITH]->(friend) WHERE p.name = 'Alice' RETURN friend;

5. Database Indexes

Purpose: Speed up data retrieval by creating pointers to data.
Types:
- B-Tree: Balanced tree structure (default for most databases).
- Hash: Fast lookups for exact matches (e.g., WHERE id = 5).
- Full-Text: Search text content (e.g., LIKE '%search%').
- Composite: Indexes on multiple columns (e.g., (last_name, first_name)).
Example (SQL):
```
CREATE INDEX idx_name ON users(name);
```

6. Database Transactions and Locking

Optimistic Locking: Assumes conflicts are rare (uses version numbers).

UPDATE users SET balance = 100, version = 2 WHERE id = 1 AND version = 1;

Pessimistic Locking: Locks rows to prevent conflicts (e.g., SELECT ... FOR UPDATE).

BEGIN TRANSACTION;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Update logic here
COMMIT;

7. Database Replication and Sharding

7.1. Replication

Master-Slave: One primary (master) handles writes; replicas (slaves) handle reads.
Use Case: Improve read performance and fault tolerance.

Example (PostgreSQL):

-- Configure replication in postgresql.conf
wal_level = replica
max_wal_senders = 5

7.2. Sharding

Horizontal Partitioning: Split data across multiple servers (shards) by a key (e.g., user ID).
Use Case: Scale write performance for large datasets.
Example:
- Shard 1: Users with IDs 1-1000
- Shard 2: Users with IDs 1001-2000

8. Database Security

Authentication: Verify user identities (e.g., passwords, certificates).
Authorization: Control access with roles/permissions (e.g., GRANT, REVOKE).
```
GRANT SELECT ON users TO read_only_user;
```
Encryption:
- At Rest: Encrypt data stored on disk (e.g., PostgreSQL’s pgcrypto).
- In Transit: Use TLS/SSL for client-server communication.
Auditing: Log access and changes (e.g., CREATE AUDIT POLICY in Oracle).

9. Database Backups and Recovery

Backup Types:
- Full Backup: Copy of the entire database.
- Incremental Backup: Only changes since the last backup.
- Point-in-Time Recovery (PITR): Restore to a specific moment.

Example (PostgreSQL):

pg_dump -U username -d dbname -f backup.sql  # Full backup
pg_restore -U username -d dbname backup.sql  # Restore

10. Choosing a Database

Criteria	Relational (SQL)	NoSQL
Data Structure	Structured (tables, rows, columns)	Flexible (documents, key-value)
Query Language	SQL	Database-specific (e.g., MongoQL)
Transactions	ACID compliance	Limited or none
Scalability	Vertical (scale up)	Horizontal (scale out)
Use Cases	Complex queries, transactions	High scalability, unstructured data
Examples	PostgreSQL, MySQL	MongoDB, Redis

11. Database Connection Pools

Purpose: Reuse database connections to reduce overhead.

Example (Python with psycopg2):

from psycopg2.pool import SimpleConnectionPool
pool = SimpleConnectionPool(1, 10, **connection_params**)
conn = pool.getconn()
# Use conn for queries
pool.putconn(conn)

12. ORMs (Object-Relational Mappers)

Purpose: Map database tables to object-oriented code (e.g., Python classes).
Examples:
- Python: SQLAlchemy, Django ORM.
- JavaScript: Sequelize, TypeORM.
- Java: Hibernate.

Example (SQLAlchemy):

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)

engine = create_engine('sqlite:///mydb.db')
Base.metadata.create_all(engine)

13. Database as a Service (DBaaS)

Cloud-Hosted Databases: Managed by providers (e.g., AWS RDS, Google Cloud SQL).
Examples:
- AWS RDS: PostgreSQL, MySQL, Oracle.
- MongoDB Atlas: Managed MongoDB.
- Firebase Realtime Database: NoSQL for mobile/web apps.

14. Common Database Operations in Code

14.1. Python (SQLite with `sqlite3`)

import sqlite3
conn = sqlite3.connect('mydb.db')
cursor = conn.cursor()

# Create table
cursor.execute('''CREATE TABLE users
                  (id INTEGER PRIMARY KEY, name TEXT, email TEXT)''')

# Insert data
cursor.execute("INSERT INTO users (name, email) VALUES (?, ?)", ("Alice", "[email protected]"))

# Query data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
    print(row)

conn.close()

14.2. Node.js (MongoDB with `mongoose`)

const mongoose = require('mongoose');
mongoose.connect('mongodb://localhost:27017/mydb');

const User = mongoose.model('User', {
  name: String,
  email: String
});

// Insert
const alice = new User({ name: "Alice", email: "[email protected]" });
await alice.save();

// Query
const users = await User.find({});
console.log(users);

14.3. Java (JDBC with PostgreSQL)

import java.sql.*;

Connection conn = DriverManager.getConnection(
    "jdbc:postgresql://localhost:5432/mydb", "user", "password");

Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users");
while (rs.next()) {
    System.out.println(rs.getString("name"));
}
conn.close();

15. Database Optimization Techniques

15.1. Query Optimization

Use Indexes: Speed up searches on frequently queried columns.

Avoid SELECT *: Fetch only needed columns.

SELECT name, email FROM users WHERE id = 1;  -- Better than SELECT *

Use Joins Wisely: Prefer joins over subqueries for large datasets.

15.2. Denormalization

Tradeoff: Sacrifice normalization for read performance (e.g., duplicate data to avoid joins).
Example: Store user_email in an orders table to avoid joining users.

15.3. Partitioning

Split large tables into smaller, manageable pieces (e.g., by date or region).

Example (PostgreSQL):

CREATE TABLE orders (
  id SERIAL,
  order_date DATE,
  amount DECIMAL
) PARTITION BY RANGE (order_date);

15.4. Caching

Cache Frequent Queries: Use Redis or Memcached to store results.

Example (Python with redis):

import redis
r = redis.Redis()
cached_users = r.get("users")
if not cached_users:
    # Query database and cache results
    r.set("users", json.dumps(users), ex=3600)  # Cache for 1 hour

16. Database Trends

Serverless Databases: Auto-scaling databases (e.g., AWS Aurora Serverless).
Multi-Model Databases: Support multiple data models (e.g., ArangoDB).
Edge Databases: Process data closer to users/devices (e.g., SQLite in mobile apps).
Blockchain Databases: Immutable, decentralized storage (e.g., BigchainDB).

17. Common Database Mistakes

No Indexes: Slow queries on large tables.
Over-Normalization: Excessive joins hurt performance.
No Backups: Risk of data loss.
Ignoring Security: SQL injection, weak passwords.
Not Monitoring: Unnoticed performance degradation.
Tight Coupling: Application logic tied to database schema.

18. Example: E-Commerce Database Schema

-- Users
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Products
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock INTEGER DEFAULT 0
);

-- Orders
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    created_at TIMESTAMP DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'pending'
);

-- Order Items
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

19. Summary

Databases store and manage structured data for applications.
Types:
- Relational (SQL): Structured, ACID-compliant (PostgreSQL, MySQL).
- NoSQL: Flexible, scalable (MongoDB, Redis).
- In-Memory: Fast access (Redis).
- Time-Series: Optimized for metrics (InfluxDB).
Key Concepts: Tables, SQL, transactions, indexes, normalization, replication.
Use Cases:
- SQL: Complex queries, transactions (e.g., banking).
- NoSQL: Scalability, unstructured data (e.g., social media).
Tools:
- ORMs: SQLAlchemy (Python), Sequelize (Node.js).
- GUI Tools: DBeaver, pgAdmin, MongoDB Compass.
- Cloud: AWS RDS, Google Cloud SQL, MongoDB Atlas.

Databases are the backbone of modern applications, enabling everything from simple blogs to global-scale platforms like Facebook or Netflix. Choosing the right database depends on your data structure, scalability needs, and performance requirements.

database - dwilson2547/wiki_demo GitHub Wiki

Databases: A Comprehensive Explanation

1. Why Use a Database?

2. Types of Databases

2.1. Relational Databases (SQL)

2.2. NoSQL Databases

2.3. In-Memory Databases

2.4. Time-Series Databases

2.5. NewSQL Databases

3. Relational Databases (SQL) Deep Dive

3.1. Core Concepts

3.2. SQL Basics

3.3. Transactions

3.4. Normalization

4. NoSQL Databases Deep Dive

4.1. Key-Value Stores

4.2. Document Databases

4.3. Column-Family Stores

4.4. Graph Databases

5. Database Indexes

6. Database Transactions and Locking

7. Database Replication and Sharding

7.1. Replication

7.2. Sharding

8. Database Security

9. Database Backups and Recovery

10. Choosing a Database

11. Database Connection Pools

12. ORMs (Object-Relational Mappers)

13. Database as a Service (DBaaS)

14. Common Database Operations in Code

14.1. Python (SQLite with sqlite3)

14.2. Node.js (MongoDB with mongoose)

14.3. Java (JDBC with PostgreSQL)

15. Database Optimization Techniques

15.1. Query Optimization

15.2. Denormalization

15.3. Partitioning

15.4. Caching

16. Database Trends

17. Common Database Mistakes

18. Example: E-Commerce Database Schema

19. Summary

⚠️ **GitHub.com Fallback** ⚠️

14.1. Python (SQLite with `sqlite3`)

14.2. Node.js (MongoDB with `mongoose`)

⚠️ GitHub.com Fallback ⚠️