NoSQL and NoSQL Data Modeling - jellyfish-tom/TIL GitHub Wiki

[SOURCES]

NoSQL - why?

  • NoSQL databases are easier to scale, as they are soft-schema and can easier evolve during lifespan of an application/client.

  • NoSQL DBs store embedded data better

    Ex: Document DBs store JSON in JSON in JSON in key-value architecture. Hence, data fetching is faster, as there is no need for JOINs on DB side. (this is why, when designing NoSQL DB you should ask "how am I going to fetch data" instead of "what shape my data has and how it relates with each other"

Part of the reason there are so many different types of NoSQL databases lies in the CAP Theorem aka Brewer's Theorem

The CAP theorem states you can provide only two out of the following three characteristics:

  • Consistency: Every read receives the most recent write or an error
  • Availability: Every request receives a response that is not an error
  • Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

NoSQL data modeling

General Ideas

NoSQL data modeling often starts from the application-specific queries as opposed to relational modeling:

  • Relational modeling is typically driven by the structure of available data. The main design theme is “What answers do I have?”
  • NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is “What questions do I have?”

So basically using NoSQL DB analyze what queries your app will do and than design your DB to best answer to those queries.

  • Data duplication and denormalization are first-class citizens

  • Soft schema (each entity in same table may have different shape) or basically 'no schema' of NoSQL DBs has its advantages:

    • Minimization of one-to-many relationships by means of nested entities and, consequently, reduction of joins.
    • Masking of “technical” differences between business entities and modeling of heterogeneous business entities using one collection of documents or one table.
  • Using NoSQL DBs with data (in DB) properly modeled allows client to fetch data in more straight forward manner and is faster than in SQL cases. Techniques like 'denormalization' allow for more atomic data fetch without JOINs on server side, which speeds things up significantly.

Techniques

- Denormalization

Denormalization can be defined as the copying of the same data into multiple documents or tables in order to simplify/optimize query processing or to fit the user’s data into a particular data model.

Applicability: Key-Value Stores, Document Databases, BigTable-style Databases

- Designing One-To-N relation

You need to consider two factors:

  • Will the entities on the “N” side of the One-to-N ever need to stand alone?
  • What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions?

Based on these factors, you can pick one of the three basic One-to-N schema designs:

  1. Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
  2. Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
  3. Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions

- Application side joins

If you need to join data that is stored in NoSQL DB you do it on client side (in app).

Of course, in many cases joins are inevitable and should be handled by an application (at design time) as opposed to relational models where joins are handled at query execution time.

Having complex problem with data modeling, check strategies here