centralized_eks - robjcook/sync GitHub Wiki

Here’s a breakdown of the technical justifications for why you would want to keep application pods deployed in an AWS EKS cluster within the same AWS account/VPC as dependent services (OpenSearch, Neptune, Aurora) instead of deploying them into a centralized enterprise EKS cluster in a different AWS account/VPC:

1. Network Latency & Performance

When EKS pods run in the same VPC as OpenSearch, Neptune, and Aurora, traffic remains on the intra-VPC network (AWS backbone), ensuring low-latency, high-throughput connections.
If pods are deployed in a central enterprise EKS cluster in another account/VPC:
- You’d need VPC Peering, Transit Gateway, or PrivateLink for cross-VPC traffic.
- Each introduces additional network hops → higher latency, potential bottlenecks.
- For databases (Aurora/Neptune) where query latency directly impacts user experience, this adds measurable degradation.

2. Data Transfer Costs

Same-VPC traffic is free.
Cross-VPC/account communication via Transit Gateway or VPC Peering incurs inter-AZ/VPC data transfer charges.
High-volume services like OpenSearch queries or Aurora read/write replication can generate significant recurring costs if traffic has to cross account/VPC boundaries.

3. Availability & Resilience

Running the EKS cluster in the same VPC as dependencies:
- Reduces reliance on shared enterprise networking constructs (Transit Gateway, Peering).
- Avoids single points of failure in cross-account networking setups.
Centralized EKS cluster:
- Outage in the Transit Gateway/Peering link → application loses access to critical databases.
- Networking misconfiguration in the central account can ripple across many dependent workloads.

4. Security & IAM Boundaries

Keeping the EKS cluster in the same account as the application resources:
- Tighter IAM scoping: Service accounts, IAM roles for service accounts (IRSA), and database authentication can be managed without cross-account trust complexities.
- Simplifies security groups and NACLs since pods and databases can directly communicate within the VPC.
Central enterprise EKS cluster:
- Requires cross-account IAM role assumptions, more complex trust policies.
- Security group rules must allow inbound from external VPC CIDRs (widening blast radius).

5. Operational Simplicity

Same-account deployments reduce:
- Complex networking setups (no need for PrivateLink/Peering).
- Cross-team dependencies (you don’t need central platform teams to maintain connectivity).
Debugging latency, dropped packets, or IAM issues across accounts is operationally harder.

6. Database-Specific Considerations

Aurora: Optimized for low-latency connections from clients in the same VPC. Cross-VPC adds unpredictable jitter → impacts query performance.
Neptune: Designed for graph workloads where query traversal requires many network round trips. Even small increases in latency can massively degrade performance.
OpenSearch: Heavy data transfer (search queries, indexing, replication) can lead to high cross-VPC costs and slower indexing/search performance.

7. Scalability & Throughput

Applications in the same VPC can scale horizontally with minimal networking constraints.
Centralized EKS cluster:
- Scaling is constrained by Transit Gateway bandwidth quotas and peering limits.
- High-throughput services (real-time analytics, recommendation engines using Neptune/Aurora) may exceed cross-VPC bandwidth capabilities.

8. Compliance & Data Residency

Some workloads (especially involving databases) may have compliance requirements that mandate traffic stay within the same account/VPC boundary.
Moving workloads to a central cluster complicates auditing and data residency guarantees.

✅ Summary: Keeping application pods in the same AWS account/VPC as their dependent services (OpenSearch, Neptune, Aurora) ensures:

Lower latency & better performance
No cross-VPC data transfer costs
Simpler IAM & security group management
Higher reliability without dependency on central networking constructs
Better scaling characteristics for database-heavy workloads

If a central enterprise EKS cluster is mandated for governance reasons, you’d want to carefully weigh these trade-offs and consider hybrid approaches (local service clusters + enterprise cluster for shared workloads).

Would you like me to also draw a comparison table (local EKS vs central EKS) to make these trade-offs more digestible for leadership/architecture review?

Got it ✅ — here’s the updated comparison table including SQS considerations:

📊 Local EKS vs. Central Enterprise EKS for Application Pods (Including SQS)

Dimension	Local EKS (Same Account/VPC as Services)	Central Enterprise EKS (External Account/VPC)
Network Latency	Intra-VPC for Aurora/Neptune/OpenSearch → lowest latency. SQS is regional, latency is minimal.	Cross-VPC adds hops for DBs (latency ↑). SQS still works regionally, but private access needs extra setup.
Data Transfer Costs	Same-VPC DB traffic is free. SQS: only per-request cost + optional VPC endpoint charges.	Cross-VPC DB traffic incurs TGW/Peering charges. SQS endpoint sharing across accounts adds fixed costs.
Throughput & Scale	No external limits, scales with demand. SQS access is simple and direct.	TGW/peering quotas can bottleneck DB-heavy workloads. SQS throughput OK but IAM + networking overheads may limit adoption.
Reliability	No dependency on external networking. SQS available region-wide + local VPC endpoints.	DB access relies on TGW/Peering reliability. For SQS, endpoint sharing or public access introduces extra failure points.
Security & IAM	IRSA roles can directly grant fine-grained DB + SQS permissions in same account.	Requires cross-account IAM trust for DBs + SQS. More complex policies and broader trust boundaries.
Operational Complexity	Straightforward networking and IAM. VPC endpoints for SQS easy to manage locally.	Requires coordination with central teams for TGW, endpoints, cross-account IAM. Harder to debug.
Database Performance	Aurora/Neptune queries optimized with local low-latency connections.	Increased network hops degrade performance for chatty DBs (Neptune especially).
SQS Access	Access via regional public endpoint or local VPC endpoint. Simple, same-account IAM policies.	Still accessible regionally, but private SQS access requires cross-account VPC endpoint sharing or PrivateLink. More complex IAM setup.
Compliance / Residency	Data + workloads stay in one account. Easier to audit DB + SQS access.	Cross-account traffic complicates audit trails; endpoint sharing adds governance overhead.
Governance / Standardization	More autonomy; less alignment with centralized enterprise patterns.	Stronger central governance; apps must conform to shared networking/IAM patterns, even if performance suffers.

👉 Key point for SQS: Unlike DBs (Aurora/Neptune/OpenSearch), latency isn’t the main issue — instead, the trade-off is about IAM and networking complexity if accessed from a central EKS cluster.

Would you like me to also extend the architecture diagram to show where SQS sits in each model (local vs. central), including VPC endpoint vs. public access paths?