IT: My Private Cloud - feralcoder/shared GitHub Wiki
Up Links
Public
feralcoder public Home feralcoder IT Living Room Data Center
HOWTO Do It Yourself
Kolla-Ansible OpenStack Deployment HOWTO: Install Kolla-Ansible HOWTO: Setup Docker Registries For OpenStack HOWTO: Kolla-Ansible Container Management HOWTO: Setup Octavia LBAAS HOWTO: Install Ceph
IAAS Mission / Requirements
Design a generally useful DIY reference architecture which is suitable to any environment.
Hardware Requirements:
- Enterprise-grade hardware WRT containerization, virtualization, and OOB support;
- Silent without much extra modification;
- Core dense primarily, memory expansive secondarily;
Service Requirements:
- Resiliant Storage: Resistant to drive failures, node failures, and bit rot (ie block-level checksums);
- Multi-Tenant resource provisioning;
- Ability to provision VM's, Containers, and Bare Metal;
- Container-Level Orchestration
- LBAAS
Network Requirements:
- Isolation, Aggregation, Resiliancy
- Tenant network tunneling (VXLAN, GRE, etc)
- DVR (distributed virtual routing)
- Service-Grade Load Balancing
Implicit in "generally useful" is "to anybody". This build should sit in the sweet spot where prices have fallen considerably to age, yet performance is still much greater than all of the cheaper options. In 2020 Ivy Bridge Xeon is the clear winner in this regard, and sits at an inflection before Intel became more strategic with core counts.
The Technology
I'm deploying OpenStack Victoria, on top of Ceph Nautilus, on top of HPE Proliant servers and blades, and ProCurve networking.
Ceph offers OpenStack some huge operational benefits:
- images and volumes are fragmented across ceph OSD nodes
- high performance access
- transparent redundancy
- management and scalability
- Bluestore on-disk storage
- Massively more efficient than blocks-on-filesystem storage (Filestore)
- Ground-up storage design to support distributed block fs operations
- Block-level checksums for automatic detection of bit-rot
- Allows for erasure-encoding on any type of storage pool - think "cluster RAID X+Y"
- Robust authentication built into the storage layer
- Could potentially serve multiple stacks with different requirements and data privilege
- Auth lives with data, for granular control and architectural security segmentation
- Robust regional clustering in the storage layer
- Data replication / offlining with other ceph clusters, eg for active-standby configuration, or production / oversubscribed infrastructure separation
- Regional mirroring supported in radosgw for multi-site clustering, allowing site-distributed stacks to share a storage layer
OpenStack is an open-ended cloud offering:
- Octavia LBAAS - Service-grade load balancing, including SSL termination, redundant VIP units with failover, L7 application logic, and rich protocol/port monitoring. Load balancers live alongside compute instances for massive scalability.
- Magnum container cluster orchestration - Deploy Kubernetes, Mesos, and Swarm clusters from script.
- Manila File Store - Provision file system units with all the benefits of a Ceph backend.
- MultiTenancy - Share the stack across many isolated users.
- Network Insanity - As much as you can imagine, my head asplode.
HPE Gear:
- Management: ILO destroys IPMI. Even ILO2. I have scripts to switch boot drives, repartition and dump images, switch back, and orchestrate the whole fleet through full wipe-and-rebuilds. I could not do this with IPMI alone.
- Reliability: The hardware build quality is the best I've seen. Maintenance rarely involves a screwdriver.
- Support: Most "support-contract-required" tools are available for "free", somewhere. The HPE support site has modern updates for some very old hardware. There are helpful HPE support employees in forums who don't obstruct the DIY no-contract crowd.
- Blades: Rock. 2x10Gbit VirtualConnect ports per blade. Each VC port offers 4 'virtual NICs' per NIC chip, physical to the OS, with distinct pipelines and buffers. Up to 6x VC ports per blade with mezzanine expansion cards, up to 24 interfaces for the OS, each up to 10Gbps, and each with configurable rate for bandwidth isolation guarantees. And all without cables.
- Blade Network Options: Traditional CISCO network modules exist for the enclosure, for when advanced traditional networking is required, such as trunk ports with native VLANs to hosts. VirtualConnect ports in blades operate with these modules, but without the x4 port expansion. Blades can connect to VC modules and traditional network modules at the same time on different network cards.
My Platform
Software
OpenStack (via Kolla-Ansible)
I'm using OpenStack Victoria on CentOS 8. I've already specifically tried and given up on MAAS and especially Ubuntu, as well as the OpenStack on OpenStack installer.
IT: Kolla-Ansible OpenStack Deployment IT: HOWTO: Install Kolla-Ansible
Ceph (via Ceph-Ansible)
Ceph Nautilus is my backend for all storage, and every OpenStack compute server is also a Ceph OSD (Hyper-Converged Infrastructure, or HCI).
In addition to providing block-level checksumming, Ceph now provides erasure coding, so redundancy can be implemented via parity instead of wholesale duplication. For example, previously N+2 redundancy in storage meant +200% volume usage. Now it can be achieved, for example, with 8+2 fragment+parity configuration, at +25% volume usage. While the latter may seem less reliable, it's actually more, because the items of redundancy are so much smaller and failure exposure per redundancy unit is greatly reduced. This is achieved at the cost of CPU cycles, which are in natural abundance on pure storage nodes...
Versions
Despite reasons to want to move to Wallaby (ie Manila issues) I'm locked into OpenStack:Victoria on Ceph:Nautilus for now... Nope. OpenStack:Wallaby on Ceph:Nautilus for now, with hacks. IT: My Cloud Versions
Hardware
I'm doing this in my Living Room Data Center.
IT: OpenStack Servers Initially I had planned to use lower-criticality systems for my admin hosts, but even there encountered disk and RAM reliability liabilities. Now, every OpenStack component lives on some form of HPE Proliant server.
bl460c Gen6 Blade: Westmere-based, control nodes. bl460c Gen8 Blade: Ivy-Bridge-based, primary control nodes. dl380p Gen8: Ivy-Bridge-based, compute and storage nodes.
Microarchitecture
Ivy Bridge - So much cores! I chose Ivy Bridge for compute and storage nodes because that was one of the last chip lines from Intel which offered very core-dense configurations in "lower-end" server CPUS - dual socket configurations.
Westmere - Low power hyperthreading. I'm using Westmere for control nodes because they have lower power options than Ivy Bridge, and can populate my HP blade enclosure very cheaply.