database - dwilson2547/wiki_demo GitHub Wiki
A database is an organized collection of structured data stored and accessed electronically. Databases enable efficient data management, including storage, retrieval, updating, and deletion of data. They are foundational to modern applications, from websites and mobile apps to enterprise systems and cloud services.
- 1. Why Use a Database?
- 2. Types of Databases
- 3. Relational Databases (SQL) Deep Dive
- 4. NoSQL Databases Deep Dive
- 5. Database Indexes
- 6. Database Transactions and Locking
- 7. Database Replication and Sharding
- 8. Database Security
- 9. Database Backups and Recovery
- 10. Choosing a Database
- 11. Database Connection Pools
- 12. ORMs (Object-Relational Mappers)
- 13. Database as a Service (DBaaS)
- 14. Common Database Operations in Code
- 15. Database Optimization Techniques
- 16. Database Trends
- 17. Common Database Mistakes
- 18. Example: E-Commerce Database Schema
- 19. Summary
Databases solve critical challenges in data management:
- Persistence: Store data permanently (e.g., user accounts, product inventories).
- Organization: Structure data for efficient access (e.g., tables, indexes).
- Concurrency: Handle multiple users/transactions simultaneously.
- Integrity: Enforce rules (e.g., constraints, relationships) to prevent corruption.
- Scalability: Grow with application demands (e.g., from 100 to 10 million users).
- Security: Control access via permissions, encryption, and auditing.
-
Structure: Data stored in tables (rows and columns) with relationships between tables.
-
Language: SQL (Structured Query Language) for querying and manipulation.
-
Examples: PostgreSQL, MySQL, SQLite, Microsoft SQL Server, Oracle.
-
Use Cases: Complex queries, transactions (e.g., banking, e-commerce).
-
Example Table: |
users
Table | |--------------|------------|------------------| | id (PK) | name | email | |--------------|------------|------------------| | 1 | Alice | [email protected]| | 2 | Bob | [email protected] | -
SQL Query Example:
SELECT * FROM users WHERE name = 'Alice';
- Structure: Non-tabular formats (e.g., key-value, document, column-family, graph).
- Language: Varies by database (e.g., MongoDB uses JSON-like queries).
-
Examples:
- Document: MongoDB, CouchDB (store JSON-like documents).
- Key-Value: Redis, DynamoDB (simple key-value pairs).
- Column-Family: Cassandra, HBase (optimized for large-scale data).
- Graph: Neo4j (for relationships between data points).
- Use Cases: High scalability, unstructured data (e.g., social networks, IoT, real-time analytics).
-
Example (MongoDB Document):
{ "_id": 1, "name": "Alice", "email": "[email protected]", "address": { "city": "New York", "zip": "10001" } }
- Structure: Data stored in RAM (not on disk) for ultra-fast access.
- Examples: Redis, Memcached.
- Use Cases: Caching, session storage, real-time analytics.
- Structure: Optimized for time-stamped data (e.g., metrics, logs).
- Examples: InfluxDB, TimescaleDB.
- Use Cases: Monitoring, IoT sensor data.
- Structure: Combine SQL’s consistency with NoSQL’s scalability.
- Examples: Google Spanner, CockroachDB.
- Use Cases: Global-scale applications requiring strong consistency.
-
Tables: Collections of related data (e.g.,
users
,orders
). - Rows (Records): Individual entries in a table (e.g., a single user).
-
Columns (Fields): Attributes of a table (e.g.,
name
,email
). -
Primary Key (PK): Unique identifier for a row (e.g.,
id
). -
Foreign Key (FK): Links rows in different tables (e.g.,
user_id
in anorders
table). - Indexes: Improve query performance by speeding up searches.
-
Constraints: Rules to maintain data integrity (e.g.,
UNIQUE
,NOT NULL
).
Operation | SQL Command | Example |
---|---|---|
Create Table | CREATE TABLE |
CREATE TABLE users (id INT, name TEXT); |
Insert Data | INSERT INTO |
INSERT INTO users VALUES (1, 'Alice'); |
Select Data | SELECT |
SELECT * FROM users; |
Update Data | UPDATE |
UPDATE users SET name = 'Bob' WHERE id = 1; |
Delete Data | DELETE FROM |
DELETE FROM users WHERE id = 1; |
Join Tables | JOIN |
SELECT * FROM users JOIN orders ON users.id = orders.user_id; |
Group Data | GROUP BY |
SELECT COUNT(*) FROM users GROUP BY city; |
Filter Data | WHERE |
SELECT * FROM users WHERE age > 18; |
-
ACID Properties: Ensure reliable transactions.
- Atomicity: All operations in a transaction succeed or fail together.
- Consistency: Transactions bring the database from one valid state to another.
- Isolation: Concurrent transactions don’t interfere.
- Durability: Committed transactions persist even after crashes.
-
Example:
BEGIN TRANSACTION; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT;
- Goal: Reduce redundancy and improve data integrity.
-
Normal Forms:
- 1NF: No repeating groups (atomic values).
- 2NF: No partial dependencies (all non-key columns depend on the entire primary key).
- 3NF: No transitive dependencies (non-key columns depend only on the primary key).
-
Example:
-
Bad (1NF Violation): |
users
Table | |--------------|------------------| | id | phones | |--------------|------------------| | 1 | "123-456, 789-012" | -
Good (3NF): |
users
Table |phones
Table | |--------------|----------------|--------------|------------| | id (PK) | name | user_id (FK) | phone | |--------------|----------------|--------------|------------| | 1 | Alice | 1 | 123-456 | | | | 1 | 789-012 |
-
-
Structure: Simple
key:value
pairs. -
Example (Redis):
SET user:1 "{'name': 'Alice', 'email': '[email protected]'}" GET user:1
- Structure: Store JSON-like documents.
-
Example (MongoDB):
db.users.insertOne({ name: "Alice", email: "[email protected]", address: { city: "New York", zip: "10001" } }); db.users.find({ name: "Alice" });
- Structure: Optimized for large-scale, distributed data (columns grouped into families).
-
Example (Cassandra):
INSERT INTO users (id, name, email) VALUES (1, 'Alice', '[email protected]'); SELECT * FROM users WHERE id = 1;
- Structure: Store entities (nodes) and relationships (edges).
-
Example (Neo4j):
CREATE (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(bob:Person {name: 'Bob'}) MATCH (p:Person)-[:FRIENDS_WITH]->(friend) WHERE p.name = 'Alice' RETURN friend;
- Purpose: Speed up data retrieval by creating pointers to data.
-
Types:
- B-Tree: Balanced tree structure (default for most databases).
-
Hash: Fast lookups for exact matches (e.g.,
WHERE id = 5
). -
Full-Text: Search text content (e.g.,
LIKE '%search%'
). -
Composite: Indexes on multiple columns (e.g.,
(last_name, first_name)
).
-
Example (SQL):
CREATE INDEX idx_name ON users(name);
-
Optimistic Locking: Assumes conflicts are rare (uses version numbers).
UPDATE users SET balance = 100, version = 2 WHERE id = 1 AND version = 1;
-
Pessimistic Locking: Locks rows to prevent conflicts (e.g.,
SELECT ... FOR UPDATE
).BEGIN TRANSACTION; SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- Update logic here COMMIT;
- Master-Slave: One primary (master) handles writes; replicas (slaves) handle reads.
- Use Case: Improve read performance and fault tolerance.
-
Example (PostgreSQL):
-- Configure replication in postgresql.conf wal_level = replica max_wal_senders = 5
- Horizontal Partitioning: Split data across multiple servers (shards) by a key (e.g., user ID).
- Use Case: Scale write performance for large datasets.
-
Example:
- Shard 1: Users with IDs 1-1000
- Shard 2: Users with IDs 1001-2000
- Authentication: Verify user identities (e.g., passwords, certificates).
-
Authorization: Control access with roles/permissions (e.g.,
GRANT
,REVOKE
).GRANT SELECT ON users TO read_only_user;
-
Encryption:
-
At Rest: Encrypt data stored on disk (e.g., PostgreSQL’s
pgcrypto
). - In Transit: Use TLS/SSL for client-server communication.
-
At Rest: Encrypt data stored on disk (e.g., PostgreSQL’s
-
Auditing: Log access and changes (e.g.,
CREATE AUDIT POLICY
in Oracle).
-
Backup Types:
- Full Backup: Copy of the entire database.
- Incremental Backup: Only changes since the last backup.
- Point-in-Time Recovery (PITR): Restore to a specific moment.
-
Example (PostgreSQL):
pg_dump -U username -d dbname -f backup.sql # Full backup pg_restore -U username -d dbname backup.sql # Restore
Criteria | Relational (SQL) | NoSQL |
---|---|---|
Data Structure | Structured (tables, rows, columns) | Flexible (documents, key-value) |
Query Language | SQL | Database-specific (e.g., MongoQL) |
Transactions | ACID compliance | Limited or none |
Scalability | Vertical (scale up) | Horizontal (scale out) |
Use Cases | Complex queries, transactions | High scalability, unstructured data |
Examples | PostgreSQL, MySQL | MongoDB, Redis |
- Purpose: Reuse database connections to reduce overhead.
-
Example (Python with
psycopg2
):from psycopg2.pool import SimpleConnectionPool pool = SimpleConnectionPool(1, 10, **connection_params**) conn = pool.getconn() # Use conn for queries pool.putconn(conn)
- Purpose: Map database tables to object-oriented code (e.g., Python classes).
-
Examples:
- Python: SQLAlchemy, Django ORM.
- JavaScript: Sequelize, TypeORM.
- Java: Hibernate.
-
Example (SQLAlchemy):
from sqlalchemy import create_engine, Column, Integer, String from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String) engine = create_engine('sqlite:///mydb.db') Base.metadata.create_all(engine)
- Cloud-Hosted Databases: Managed by providers (e.g., AWS RDS, Google Cloud SQL).
-
Examples:
- AWS RDS: PostgreSQL, MySQL, Oracle.
- MongoDB Atlas: Managed MongoDB.
- Firebase Realtime Database: NoSQL for mobile/web apps.
import sqlite3
conn = sqlite3.connect('mydb.db')
cursor = conn.cursor()
# Create table
cursor.execute('''CREATE TABLE users
(id INTEGER PRIMARY KEY, name TEXT, email TEXT)''')
# Insert data
cursor.execute("INSERT INTO users (name, email) VALUES (?, ?)", ("Alice", "[email protected]"))
# Query data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
print(row)
conn.close()
const mongoose = require('mongoose');
mongoose.connect('mongodb://localhost:27017/mydb');
const User = mongoose.model('User', {
name: String,
email: String
});
// Insert
const alice = new User({ name: "Alice", email: "[email protected]" });
await alice.save();
// Query
const users = await User.find({});
console.log(users);
import java.sql.*;
Connection conn = DriverManager.getConnection(
"jdbc:postgresql://localhost:5432/mydb", "user", "password");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users");
while (rs.next()) {
System.out.println(rs.getString("name"));
}
conn.close();
- Use Indexes: Speed up searches on frequently queried columns.
-
Avoid
SELECT *
: Fetch only needed columns.SELECT name, email FROM users WHERE id = 1; -- Better than SELECT *
- Use Joins Wisely: Prefer joins over subqueries for large datasets.
- Tradeoff: Sacrifice normalization for read performance (e.g., duplicate data to avoid joins).
-
Example: Store
user_email
in anorders
table to avoid joiningusers
.
- Split large tables into smaller, manageable pieces (e.g., by date or region).
-
Example (PostgreSQL):
CREATE TABLE orders ( id SERIAL, order_date DATE, amount DECIMAL ) PARTITION BY RANGE (order_date);
- Cache Frequent Queries: Use Redis or Memcached to store results.
-
Example (Python with
redis
):import redis r = redis.Redis() cached_users = r.get("users") if not cached_users: # Query database and cache results r.set("users", json.dumps(users), ex=3600) # Cache for 1 hour
- Serverless Databases: Auto-scaling databases (e.g., AWS Aurora Serverless).
- Multi-Model Databases: Support multiple data models (e.g., ArangoDB).
- Edge Databases: Process data closer to users/devices (e.g., SQLite in mobile apps).
- Blockchain Databases: Immutable, decentralized storage (e.g., BigchainDB).
- No Indexes: Slow queries on large tables.
- Over-Normalization: Excessive joins hurt performance.
- No Backups: Risk of data loss.
- Ignoring Security: SQL injection, weak passwords.
- Not Monitoring: Unnoticed performance degradation.
- Tight Coupling: Application logic tied to database schema.
-- Users
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
stock INTEGER DEFAULT 0
);
-- Orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
created_at TIMESTAMP DEFAULT NOW(),
status VARCHAR(20) DEFAULT 'pending'
);
-- Order Items
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
price DECIMAL(10, 2) NOT NULL
);
- Databases store and manage structured data for applications.
-
Types:
- Relational (SQL): Structured, ACID-compliant (PostgreSQL, MySQL).
- NoSQL: Flexible, scalable (MongoDB, Redis).
- In-Memory: Fast access (Redis).
- Time-Series: Optimized for metrics (InfluxDB).
- Key Concepts: Tables, SQL, transactions, indexes, normalization, replication.
-
Use Cases:
- SQL: Complex queries, transactions (e.g., banking).
- NoSQL: Scalability, unstructured data (e.g., social media).
-
Tools:
- ORMs: SQLAlchemy (Python), Sequelize (Node.js).
- GUI Tools: DBeaver, pgAdmin, MongoDB Compass.
- Cloud: AWS RDS, Google Cloud SQL, MongoDB Atlas.
Databases are the backbone of modern applications, enabling everything from simple blogs to global-scale platforms like Facebook or Netflix. Choosing the right database depends on your data structure, scalability needs, and performance requirements.