Use when schema is fixed, Referential Integrity and Constraints are need.
Supports complex querying
Use-cases: Banking and Finance, Ecommerce orders, payment transactions etc.
Key-Value Store - Redis, Memcache, DynamoDB
Key-Value stores are No-SQL databases, used for fast retrieval of data. Usually stores data as String.
No Schema, No querying and indexing on data
Use-case: Caching, session storage, Real-time analytics etc.
Document DB - MongoDB, CouchDB
It is No-SQL DB with flexible schema. Information is typically stored in JSON, BSON and XML.
Designed for high read/write.
Semi structured Schema, supports querying and indexing on data.
Use-case: Product Catalog, User Profile, Comments, real-time analytics etc.
In-Memory - Redis, Memcache
Temporary storage for fast data retrieval.
Use-case: Cache frequently used data- leaderboards, frequently retrieved products etc.
Wide Column DB - Cassandra, DynamoDB
It belongs to column familty. A "column family" (now called a "table") can contain rows with a huge number of columns (potentially millions), and each row can have a different set of columns.
It is another type of NoSQL DB which has rows and columns but each row may have different columns. Used for storing and querying large amount of data.
It supports indexing and querying on data columns, not just on partition keys.
Support extremly-high read/write requests.
Use-case: Write heavy apps- Web analytics (Click-stream data), Inventory updates etc..
Text Search - Solr, Elastic Cache
Use for fast retrieval of text data
Use-case: text search- product name, description etc., log analysis, google search etc.
Spatial - Redis, PostgreSQL, Oracle Spatial
Designed for storing, analyzing and easily retrieving geographical data (coordinates, location etc.)
Use-case: Uber, Search restaurants near me
Time-Series DB - InfluxDB, Prometheus, Graphite, TimescalDB, OpenTSDB
Specialized in storing, retrieving and analyzing timestamp based data.
Time-based partitioning, Primary index on time eables fast storage and retrieval based on timestamp.
Handles continuous writes from thousands from resources.
Use-case: Monitoring and Observability (Logs ect.), IOT and Sensor data, financial trading platform
Vector DB- FAISS, PineCone
Vector DBs are specialized to store, index, and search multi-dimensional vector data - numerical representation (Embeddings) of unstructured data (doc, text, image, audio, video etc).
Use-case: Retrieval-augmented generation (RAG) for LLMs. Semantic search, recommendation system, Chatbots etc.
Ledger Database - Amazon QLDB (Quantum Ledger),MS SQL Server Ledger, Ethereum (a block chain DB)
Specialized in blockchain transactions.
Each transaction is immutable and uses append-only mode which means every change is recorded as new entry and preserves the full audit-trail.
Transactions are cryptographically hashed to prevent tempering.
Blob/Object Storage- S3, Azure Blob Storage, MinIO (open-source), HDFS can act as blob store
Can store large amount of unstructured data e.g. Documents, media (Images, audio, video) and other content(vmware, db backup as blobs" (Binary Large Objects).
Types of Blob Storage
Block Blobs: Efficiently stores large files into chunks and supports parallel update/uploads. e.g. document, images, media, backups.
Append Blobs: Optimized for data needs to be appended frequently. e.g. logging, audit-trails etc.
Page Blobs: Stores random access files for read/write. Page blobs are collection of 512 byte pages. Used in Azure Blob Storage. e.g. Your VM is backed by disc in Azure Storage which is stored as page blobs. If you want to install any software, VM writes that to a specific offset corresponding to those changes.
Use-case: Image/Video repos, Data backups and archival, CDN etc.
Column oriented databases stores all the data by column instead of rows. Which means all values of a column as stored together. e.g. all customer names are stored together and address are stored together.
Provide fast DB reads by pulling only relevant columns.
Column DBs often use Partitioning and Sharding.
Example -
Column "id": [1, 2, 3]
Column "name": [Alice, Bob, Charlie]
Column "address": [NY, USA, LA, USA, Toronto, Canada]
Column "age": [32, 45, 28]
Use-case: Analytics, Business Intelligence, Reporting where only few columns are targeted. Data Aggregation(Sum, Avg) is easy, Improved compression.
Hadoop - Provides foundation for Data Lake (Handles structured and unstructured data)
Hadoop is a distributed storage and processing framework, not a database.
It provides the infrastructure for big data storage and computation.
Provides database like features on top of HDFS (Hadoop Distributed File System) using:
Hive: Adds SQL-like querying on top of Hadoop.
HBase: NoSQL wide-column DB build on top of Hadoop.
Impala, Drill, Presto: Enable fast, interactive analytics on data stored in Hadoop.