4 Pillars of well architected system

read

Design System Architectures that are Secure, Resilient, High Performing and Cost Optimized.

4 Pillars of well architected system

Secure
Resilient
High Performant
Cost Optimized

Practices in a nutshell

read

For a Read-Heavy System: Consider using a Cache.
For a Write-Heavy System: Use Message Queues for async processing.
For a Low Latency Requirement: Consider using a Cache and CDN.
For Atomicity, Consistency, Isolation, Durability (ACID) Compliance: Go for RDBMS/SQL Database.
For Unstructured Data: Go for NoSQL Database.
For Complex Data (Videos, Images, Files): Go for Blob/Object storage.
For Complex Pre-computation: Use Message Queue & Cache.
For High-Volume Data Search: Consider search index, tries or search engine.
For Scaling SQL Database: Implement Database Sharding.
For High Availability, Performance, & Throughput: Use a Load Balancer.
For Global Data Delivery: Consider using a CDN.
For Graph Data (data with nodes, edges, and relationships): Utilize Graph Database.
For Scaling Various Components: Implement Horizontal Scaling.
For High-Performing Database Queries: Use Database Indexes.
For Bulk Job Processing: Consider Batch Processing & Message Queues.
For Server Load Management & Preventing DOS Attacks: Use a Rate Limiter.
For Microservices Architecture: Use an API Gateway.
For Single Point of Failure: Implement Redundancy.
For Fault-Tolerance and Durability: Implement Data Replication.
For User-to-User fast communication: Use Websockets.
For Failure Detection in Distributed Systems: Implement a Heartbeat.
For Data Integrity: Use Checksum Algorithm.
For Efficient Server Scaling: Implement Consistent Hashing.
For Decentralized Data Transfer: Consider Gossip Protocol.
For Location-Based Functionality: Use Quadtree, Geohash, etc.
For Avoiding Specific Technology Names: Use generic terms.
For High Availability and Consistency Trade-Off: Eventual Consistency.
For IP resolution & Domain Name Query: Mention DNS.
For Handling Large Data in Network Requests: Implement Pagination.
Cache Eviction Policy - Preferred is LRU (Least Recently Used) Cache.
To handle traffic spikes: Implement Autoscaling to manage resources dynamically by
Need analytics and audit trails - Consider using data lakes or append-only databases
Handling Large-Scale Simultaneous Connections - Use Connection Pooling and consider using Protobuf to minimize data payload

Real World Implementation Considerations

read

Grasp the Requirements:

Functional Requirements: Define what the system is expected to accomplish, including features and functionality.
Non-Functional Requirements: Address aspects like performance, scalability, security, and availability.

Select the Appropriate Architecture

Monolithic vs. Microservices: Choose between building a monolithic application or decomposing it into microservices based on your system's needs.
Layered Architecture: Implement a layered structure (presentation, application, business, and data layers) to ensure a clear separation of concerns for easier management and scalability.

Scalability

Horizontal Scaling: Increase capacity by adding more servers to accommodate higher demand.
Vertical Scaling: Enhance the performance of existing servers by adding resources like CPU and RAM.
Load Balancing: Evenly distribute incoming traffic across multiple servers to prevent any single server from being overloaded.

Database Design

Normalization vs. Denormalization: Find a balance between normalized data models (which reduce redundancy) and denormalized models (which enhance read performance).
Choosing the Right Database: Opt for SQL when managing relational data, and NoSQL for unstructured data.
Sharding: Distribute data across multiple databases to efficiently manage large datasets and high traffic loads.
Locking Mechanisms: Implement Optimistic Concurrency Control or Pessimistic Concurrency Control to manage data access in concurrent environments.

Distributed Locking

Prevent Deadlocks: To avoid deadlocks, always assign a Time to Live (TTL) to locks, ensuring they expire if a process crashes before releasing the lock.
Renewal Mechanism: Implement a renewal system for situations where the process holding the lock requires additional time.
Redis: Utilize Redis with the SETNX command for straightforward locking solutions.
Zookeeper: Offers more advanced coordination features with strong consistency guarantees.
Etcd: A viable alternative for distributed locking, providing strong consistency and fault tolerance.

Distributed Caching

Data Caching: Cache frequently accessed data to reduce the load on the database and improve response times.
Content Delivery Networks (CDNs): Implement CDNs to cache and serve static content closer to users for faster delivery.
Partitioning: Distribute data across multiple partitions or shards to efficiently manage high throughput.

Fault Tolerance and High Availability

Redundancy: Ensure duplication of critical components to eliminate single points of failure.
Failover Mechanisms: Implement automatic switching to a standby system when failures occur.
Backup and Recovery: Perform regular data backups and maintain a comprehensive recovery plan.

Monitoring and Logging

Real-Time Monitoring: Implement monitoring tools to continuously track the system's performance and health in real time.
Logging: Keep detailed logs to aid in debugging and analyzing system behavior.

APIs and Communication

RESTful APIs: Use REST for web services to ensure scalability and stateless communication.
Message Queues: Use message queues (e.g., RabbitMQ, Kafka) for asynchronous communication between services.
Realtime communications

Development Best Practices

Version Control: Implement version control systems such as Git to manage changes and enhance collaboration.
Continuous Integration/Continuous Deployment (CI/CD): Automate testing and deployment processes to ensure fast and dependable delivery of updates.
Code Reviews: Conduct regular code reviews to uphold quality and ensure consistency across the project.

Security

Authentication and Authorization: Verify user identities and ensure they have the appropriate access to resources.
Encryption: Secure sensitive data by encrypting it both at rest and during transmission.
Firewalls and Intrusion Detection: Employ network security tools to monitor and block unauthorized access attempts.

Data Consistency Requirements

Implement Strong Consistency (optimistic locking) for data-oriented applications such as banking applications (ex: critical use case, such as updating balance)
Implement Eventual consistency for less critical operations such as updating transaction history updates

Performance and Scalability

While ensuring data consistency by applying optimistic locking mechanism, it can impact the performance, increase the latency

Error Handling and Recovery

Systems must gracefully handle version conflicts, service failures, and network issues. This includes implementing retry mechanisms, circuit breakers, and compensation logic for failed operations.

12 Factor Methodology

read

It talks about development, operation, and scaling.

It is a triangulation on ideal practices for app development, paying particular attention to the dynamics of the organic growth of an app over time, the dynamics of collaboration between developers working on the app’s codebase, and avoiding the cost of software erosion.

I. Codebase One codebase tracked in revision control, many deploys

II. Dependencies Explicitly declare and isolate dependencies

III. Config Store config in the environment

IV. Backing services Treat backing services as attached resources

V. Build, release, run Strictly separate build and run stages

VI. Processes Execute the app as one or more stateless processes

VII. Port binding Export services via port binding

VIII. Concurrency Scale out via the process model

IX. Disposability Maximize robustness with fast startup and graceful shutdown

X. Dev/prod parity Keep development, staging, and production as similar as possible

XI. Logs Treat logs as event streams

XII. Admin processes Run admin/management tasks as one-off processes

Key Components of System Design

read

System design is the art of creating a blueprint for a system that meets specified requirements, solves user problems, and handles future growth. A well-designed system is scalable, reliable, maintainable, and secure, making it an essential aspect of software engineering. But what exactly are the key components of system design, and what should you keep in mind while designing a system? Let’s explore.

Key Components of System Design

Architecture

The architecture of a system defines its structure and behavior. It is the foundation upon which everything else is built. There are several architectural patterns to choose from, such as monolithic, microservices, and serverless, each with its own pros and cons. Selecting the right architecture depends on the specific requirements of your application, such as scalability, complexity, and deployment needs.

Database Design

The database is the backbone of most applications, and how it’s designed has a significant impact on performance, scalability, and maintainability. Whether you choose SQL or NoSQL databases, it's crucial to design your data models and relationships carefully. Proper indexing, normalization (or denormalization where applicable), and understanding the query patterns are key to efficient database design.

APIs and Communication

APIs (Application Programming Interfaces) define how different components of your system interact with each other and with external systems. The choice of communication protocol (e.g., REST, GraphQL, gRPC) depends on factors like performance, flexibility, and simplicity. In a distributed system, it’s also important to design for fault-tolerant communication, considering retries, timeouts, and circuit breakers.

Caching

Caching is a critical component for optimizing performance by storing frequently accessed data in memory. By reducing the need to repeatedly fetch data from slower storage layers, caching helps improve response times and reduce load on the database. However, proper cache invalidation strategies must be implemented to avoid stale data.

Load Balancing

Load balancing distributes incoming traffic across multiple servers to ensure no single server is overwhelmed. This enhances availability and reliability, preventing downtime and improving user experience. Common strategies include round-robin, least connections, and weighted load balancing, depending on the system’s specific needs.

Security

Security should be built into every layer of your system design. This includes securing data in transit and at rest, implementing authentication and authorization mechanisms, and protecting against common vulnerabilities like SQL injection, cross-site scripting (XSS), and denial of service (DoS) attacks. Regular security audits and updates are essential to maintain a secure system.

Scalability and Performance

Scalability is the ability of a system to handle increasing loads by adding resources. Horizontal scaling (adding more servers) and vertical scaling (upgrading existing servers) are two common approaches. Performance optimization includes ensuring fast response times, minimal latency, and efficient resource usage. Load testing and performance monitoring help in identifying bottlenecks.

Redundancy and Fault Tolerance

Redundancy involves duplicating critical components or functions to provide backup in case of failure. Fault tolerance is the system's ability to continue operating despite failures. Together, these concepts ensure high availability and reliability, which are crucial for mission-critical applications.

Monitoring and Logging

Monitoring and logging are essential for understanding the health and performance of your system. Tools like Prometheus, Grafana, and ELK (Elasticsearch, Logstash, Kibana) help track metrics, detect anomalies, and diagnose issues. A well-designed monitoring system can alert you to problems before they affect users.

User Experience (UX)

While system design is often focused on the backend, it’s essential to consider how design decisions affect the user experience. Fast, responsive, and intuitive interfaces contribute to user satisfaction. Ensure that your design supports smooth interactions, even under heavy loads.

Things to Remember While Designing a System

Understand the Requirements

Before you start designing, make sure you thoroughly understand the requirements. Talk to stakeholders, clarify any ambiguities, and consider both functional and non-functional requirements. Understanding the problem you’re solving is the first step toward creating a system that meets user needs.

Design for Scale

Even if you’re starting small, consider how your system will scale as user demand grows. Will your chosen architecture and components support horizontal and vertical scaling? Designing with scalability in mind from the start can save you from costly refactors later on.

Keep It Simple

Complexity is the enemy of reliability and maintainability. Strive for simplicity in your design, avoiding unnecessary features or overly complicated architectures. Simple designs are easier to understand, debug, and extend.

Think About Failure

Systems will inevitably fail, whether due to hardware issues, network outages, or software bugs. Designing with failure in mind ensures your system can recover gracefully. Implement redundancy, fault tolerance, and disaster recovery plans to minimize the impact of failures.

Prioritize Security

Security should never be an afterthought. Design your system with security in mind from the beginning, considering potential threats and vulnerabilities. Regularly update and patch your system to protect against new security risks.

Test Early and Often

Testing is critical to ensuring your system behaves as expected. Unit tests, integration tests, and load tests help identify issues early in the development process. Continuous testing also ensures that new features don’t break existing functionality.

Plan for Maintenance

Your system will need updates and changes over time. Design with maintainability in mind, using modular architecture, clear documentation, and version control. This makes it easier to implement new features, fix bugs, and adapt to changing requirements.

Document Your Design

Good documentation is essential for both current and future developers working on your system. Document your architecture, design decisions, APIs, and key components. Clear documentation makes onboarding new team members easier and ensures consistency across the team.

Leverage Existing Tools and Services

Don’t reinvent the wheel. Use existing tools, frameworks, and services that solve common problems, such as authentication, monitoring, or database management. Leveraging proven solutions saves time and reduces the risk of introducing bugs or vulnerabilities.

Get Feedback

Designing in isolation can lead to blind spots. Seek feedback from peers, stakeholders, and users throughout the design process. This helps identify potential issues, align your design with user needs, and improve the overall quality of the system.

System Design ‐ Essential Practices - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki

4 Pillars of well architected system

Design System Architectures that are Secure, Resilient, High Performing and Cost Optimized.