Database and Storage solutions - kdwivedi1985/system-design GitHub Wiki

Key-Value stores are No-SQL databases, used for fast retrieval of data. Usually stores data as String.
No Schema, No querying and indexing on data
Use-case: Caching, session storage, Real-time analytics etc.

It is No-SQL DB with flexible schema. Information is typically stored in JSON, BSON and XML.
Designed for high read/write.
Semi structured Schema, supports querying and indexing on data.
Use-case: Product Catalog, User Profile, Comments, real-time analytics etc.

Temporary storage for fast data retrieval.
Use-case: Cache frequently used data- leaderboards, frequently retrieved products etc.

It belongs to column familty. A "column family" (now called a "table") can contain rows with a huge number of columns (potentially millions), and each row can have a different set of columns.
It is another type of NoSQL DB which has rows and columns but each row may have different columns. Used for storing and querying large amount of data.
It supports indexing and querying on data columns, not just on partition keys.
Support extremly-high read/write requests.
Use-case: Write heavy apps- Web analytics (Click-stream data), Inventory updates etc..

Use for fast retrieval of text data
Use-case: text search- product name, description etc., log analysis, google search etc.

Designed for storing, analyzing and easily retrieving geographical data (coordinates, location etc.)
Use-case: Uber, Search restaurants near me

Specialized in storing, retrieving and analyzing timestamp based data.
Time-based partitioning, Primary index on time eables fast storage and retrieval based on timestamp.
Handles continuous writes from thousands from resources.
Use-case: Monitoring and Observability (Logs ect.), IOT and Sensor data, financial trading platform

Vector DBs are specialized to store, index, and search multi-dimensional vector data - numerical representation (Embeddings) of unstructured data (doc, text, image, audio, video etc).
Use-case: Retrieval-augmented generation (RAG) for LLMs. Semantic search, recommendation system, Chatbots etc.

Specialized in blockchain transactions.
Each transaction is immutable and uses append-only mode which means every change is recorded as new entry and preserves the full audit-trail.
Transactions are cryptographically hashed to prevent tempering.
Use-case: Banking (transactions, balances etc.), supply-chain, healthcare, regulatory compliance, blockchain etc.

Can store large amount of unstructured data e.g. Documents, media (Images, audio, video) and other content(vmware, db backup as blobs" (Binary Large Objects).
Types of Blob Storage
- Block Blobs: Efficiently stores large files into chunks and supports parallel update/uploads. e.g. document, images, media, backups.
- Append Blobs: Optimized for data needs to be appended frequently. e.g. logging, audit-trails etc.
- Page Blobs: Stores random access files for read/write. Page blobs are collection of 512 byte pages. Used in Azure Blob Storage. e.g. Your VM is backed by disc in Azure Storage which is stored as page blobs. If you want to install any software, VM writes that to a specific offset corresponding to those changes.
Use-case: Image/Video repos, Data backups and archival, CDN etc.

Graph DBs are specialized at storing, managing and querying the data structured as complex interconnected network /Graph.
Data is represented as node, relationship as edge and properties (attributes describing node and edges).
Properties are key-value pairs and Graph DB is kind of No-SQL DB.
Use-case: Highly connected data- Social Network, Network and IT operations(Mapping and analyzing infrastructure dependencies)

These are light-weight DBs and tightly coupled within the application.
Runs as same process as application instead separate server.
Store, retrieves and manages data without needing a network connection.
Some Embedded DBs are ACID Complaint
Use-case:
- Mobile Applications: Local data storage for apps on smartphones and tablets
- Embedded Systems: Used in hardware devices like routers, IoT sensors
- Edge Computing- IOT devices
- Game Development/ Desktop Applications- Local store for use profile, settings, offline data etc.

Column oriented databases stores all the data by column instead of rows. Which means all values of a column as stored together. e.g. all customer names are stored together and address are stored together.
Provide fast DB reads by pulling only relevant columns.
Column DBs often use Partitioning and Sharding.
Example -
- Column "id": [1, 2, 3]
- Column "name": [Alice, Bob, Charlie]
- Column "address": [NY, USA, LA, USA, Toronto, Canada]
- Column "age": [32, 45, 28]
Use-case: Analytics, Business Intelligence, Reporting where only few columns are targeted. Data Aggregation(Sum, Avg) is easy, Improved compression.

Hadoop is a distributed storage and processing framework, not a database.
It provides the infrastructure for big data storage and computation.
Provides database like features on top of HDFS (Hadoop Distributed File System) using:
- Hive: Adds SQL-like querying on top of Hadoop.
- HBase: NoSQL wide-column DB build on top of Hadoop.
- Impala, Drill, Presto: Enable fast, interactive analytics on data stored in Hadoop.