Database and Storage solutions - kdwivedi1985/system-design GitHub Wiki

Relational Database (RDBMS)- MySQL, PostgreSQL, SQLServer, Oracle, YugabyteDB, CockroachDB

  • ACID Complaint
  • Use when schema is fixed, Referential Integrity and Constraints are need.
  • Supports complex querying
  • Use-cases: Banking and Finance, Ecommerce orders, payment transactions etc.

Key-Value Store - Redis, Memcache, DynamoDB

  • Key-Value stores are No-SQL databases, used for fast retrieval of data. Usually stores data as String.
  • No Schema, No querying and indexing on data
  • Use-case: Caching, session storage, Real-time analytics etc.

Document DB - MongoDB, CouchDB

  • It is No-SQL DB with flexible schema. Information is typically stored in JSON, BSON and XML.
  • Designed for high read/write.
  • Semi structured Schema, supports querying and indexing on data.
  • Use-case: Product Catalog, User Profile, Comments, real-time analytics etc.

In-Memory - Redis, Memcache

  • Temporary storage for fast data retrieval.
  • Use-case: Cache frequently used data- leaderboards, frequently retrieved products etc.

Wide Column DB - Cassandra, DynamoDB

  • It belongs to column familty. A "column family" (now called a "table") can contain rows with a huge number of columns (potentially millions), and each row can have a different set of columns.
  • It is another type of NoSQL DB which has rows and columns but each row may have different columns. Used for storing and querying large amount of data.
  • It supports indexing and querying on data columns, not just on partition keys.
  • Support extremly-high read/write requests.
  • Use-case: Write heavy apps- Web analytics (Click-stream data), Inventory updates etc..

Text Search - Solr, Elastic Cache

  • Use for fast retrieval of text data
  • Use-case: text search- product name, description etc., log analysis, google search etc.

Spatial - Redis, PostgreSQL, Oracle Spatial

  • Designed for storing, analyzing and easily retrieving geographical data (coordinates, location etc.)
  • Use-case: Uber, Search restaurants near me

Time-Series DB - InfluxDB, Prometheus, Graphite, TimescalDB, OpenTSDB

  • Specialized in storing, retrieving and analyzing timestamp based data.
  • Time-based partitioning, Primary index on time eables fast storage and retrieval based on timestamp.
  • Handles continuous writes from thousands from resources.
  • Use-case: Monitoring and Observability (Logs ect.), IOT and Sensor data, financial trading platform

Vector DB- FAISS, PineCone

  • Vector DBs are specialized to store, index, and search multi-dimensional vector data - numerical representation (Embeddings) of unstructured data (doc, text, image, audio, video etc).
  • Use-case: Retrieval-augmented generation (RAG) for LLMs. Semantic search, recommendation system, Chatbots etc.

Ledger Database - Amazon QLDB (Quantum Ledger),MS SQL Server Ledger, Ethereum (a block chain DB)

  • Specialized in blockchain transactions.
  • Each transaction is immutable and uses append-only mode which means every change is recorded as new entry and preserves the full audit-trail.
  • Transactions are cryptographically hashed to prevent tempering.
  • Use-case: Banking (transactions, balances etc.), supply-chain, healthcare, regulatory compliance, blockchain etc.

Blob/Object Storage- S3, Azure Blob Storage, MinIO (open-source), HDFS can act as blob store

  • Can store large amount of unstructured data e.g. Documents, media (Images, audio, video) and other content(vmware, db backup as blobs" (Binary Large Objects).
  • Types of Blob Storage
    • Block Blobs: Efficiently stores large files into chunks and supports parallel update/uploads. e.g. document, images, media, backups.
    • Append Blobs: Optimized for data needs to be appended frequently. e.g. logging, audit-trails etc.
    • Page Blobs: Stores random access files for read/write. Page blobs are collection of 512 byte pages. Used in Azure Blob Storage. e.g. Your VM is backed by disc in Azure Storage which is stored as page blobs. If you want to install any software, VM writes that to a specific offset corresponding to those changes.
  • Use-case: Image/Video repos, Data backups and archival, CDN etc.

Graph DB- Neo4j(open-source), Amazon Neptune, Oracle Graph Database,

  • Graph DBs are specialized at storing, managing and querying the data structured as complex interconnected network /Graph.
  • Data is represented as node, relationship as edge and properties (attributes describing node and edges).
  • Properties are key-value pairs and Graph DB is kind of No-SQL DB.
  • Use-case: Highly connected data- Social Network, Network and IT operations(Mapping and analyzing infrastructure dependencies)

Embedded Database- SQLite(Used in Mobile), HSQLDB (Java based embedded RDBMS), RocksDB(Fast read/write), DuckDB(Analytical DB for local)

  • These are light-weight DBs and tightly coupled within the application.
  • Runs as same process as application instead separate server.
  • Store, retrieves and manages data without needing a network connection.
  • Some Embedded DBs are ACID Complaint
  • Use-case:
    • Mobile Applications: Local data storage for apps on smartphones and tablets
    • Embedded Systems: Used in hardware devices like routers, IoT sensors
    • Edge Computing- IOT devices
    • Game Development/ Desktop Applications- Local store for use profile, settings, offline data etc.

Hierarchical Store - ZooKeeper

  • Data is structured as tree. One parent and many children
  • Zookeeper is not a Db, but considered as service.
  • Path is considered as key and data stored at a path is value.

Columnar Databases - Amazon Redshift, Google BigQuery, Snowflake, Apache Parquet (storage format)

  • Column oriented databases stores all the data by column instead of rows. Which means all values of a column as stored together. e.g. all customer names are stored together and address are stored together.
  • Provide fast DB reads by pulling only relevant columns.
  • Column DBs often use Partitioning and Sharding.
  • Example -
    • Column "id": [1, 2, 3]
    • Column "name": [Alice, Bob, Charlie]
    • Column "address": [NY, USA, LA, USA, Toronto, Canada]
    • Column "age": [32, 45, 28]
  • Use-case: Analytics, Business Intelligence, Reporting where only few columns are targeted. Data Aggregation(Sum, Avg) is easy, Improved compression.

Hadoop - Provides foundation for Data Lake (Handles structured and unstructured data)

  • Hadoop is a distributed storage and processing framework, not a database.
  • It provides the infrastructure for big data storage and computation.
  • Provides database like features on top of HDFS (Hadoop Distributed File System) using:
    • Hive: Adds SQL-like querying on top of Hadoop.
    • HBase: NoSQL wide-column DB build on top of Hadoop.
    • Impala, Drill, Presto: Enable fast, interactive analytics on data stored in Hadoop.