Common Sharding Problems - rFronteddu/general_wiki GitHub Wiki
Most problems are due to the fact that operations across multiple tables or multiple rows in the same table will no longer run on the same server.
- Joins and Denormalization: Join that span database shards are not efficient since data has to be compiled from multiple servers. Common workaround is to denormalize so that queries that previously required joins can be done from a single table (denormalization can introduce data inconsistency).
- Referential Integrity: Enforcing data integrity constrains such as foreign keys in a shared database can be difficult. Most RDBMS do not support foreign keys constrains across databases on different servers. Applications that need this may have to enforce it in application code.
- Rebalancing: Sharding scheme may need to change due to changes in data distribution (city that develops ZIP problem) or shard load (hot spots for celebrities). Rebalancing without downtime is complicated, directory based partitioning can ease this process at the cost of increasing complexity and creating a new single point of failure (the lookup service/db).