Alcor v0.19 Release Plan - futurewei-cloud/alcor-int GitHub Wiki
Open-Source Plan for v0.19 Release
Tentative date: 09/30/2021
Release: Alcor v0.19
Release link: TBD
Alcor Performance & Scalability
- OpenStack Cross-Service Request-Level Profiling
- Environment Setup
- Set up OpenStack cluster with kolla-ansible, and enable OSProfiler and Rally
- Set up Alcor Controller service with K8s
- Set up ElasticSearch search and analytics cluster in Alcor's k8s deployment
- Set up SIG storage local cluster
- Upgrade Rally version in Medina perf test cluster
- Upgrade OSProfiler version in Medina perf test cluster (Nova, KeyStone, Cinder, Glance)
- Install Jaeger in Medina perf test cluster
- Use Jaeger's openstracing features in Alcor
- Service level tracing support
- Function level tracing support
- Pass trace from PM to DPM
- Tool and script development
- Containerized ACA to replace Neutron-OpenvSwitch-agent container in kolla-ansible
- Enable OSProfiler for Nova neutron client
- OSProfiler to Jaeger trace convertor (moved to 12/30)
- Environment Setup
- Run Alcor performance test via Rally
- v0.17 perf test plan
- Routing rule update new scenario E2E
- Write new Rally plugin and test configuration for routing rule update
- Customize Python Neutron client to support update/delete subnet route table
- Run new Rally test create-and-list-routers for L3 testing
- Consolidate Rally test script to run all test at once
- Fix existing Rally test cases and Alcor bugs
- Router-related resource cleanup issue
- Issue 646: Add a flag to getOrCreateVpcRouter to allow customize router creation
- Separate Neutron and VPC router scenarios for getSubnetRouteTable
- Issue 648: Allow to delete subnet w/o a gateway port
- Issue 654: Separate RM getOrCreateSubnetRouteTable into two APIs and consolidate POST with createNeutronSubnetRouteTable
- Issue 656: Routing entry concurrent insertion can't be handled by RM (fixed in PR 658)
- PR 661: Fix gateway port duplicated creation issue
- Issue 662: Fix race condition in subnet deletion API (fixed in PR 663)
- Issue 666: Ip address allocation not found, 404 error by IP manager (Under Test)
- Issue 667: 412 Precondition Failed Ip address conflict with exist (Under Test)
- Issue 668: Fix handling of failed compute node in DPM
- Regression Fix: Allow empty routing table when creating router state
- On-demand perf profiling with test controller
- Latency measurement on on-demand E2E
- Ignite R/W optimization
- NCM to Ignite profiling and latency data collection for both goal state provisioning and on-demand requests
- NCM cache write optimization
- Batch cache write
- NCM-ACA communication channel optimization
- On-demand Query Channel optimization
- GoalState Push Channel optimization
- NCM gRPC client: gRPC channel/stub pool and keepalive for channels
- Comparing performance for synchronous and asynchronous gRPC servers on ACA
- ACA Async gRPC server and thread pool implementation
- gRPC channel warm-up
- ACA adjustable thread pool size based on host core number (stretch goal)
- Host ip lookup optimization
- Change from shared completion queue to one completion queue (stretch goal)
- Performance report for channel optimization
- Fix ACA crash issue when concurrently processing a large number of GoalStates
- Alcor Control Agent major refactor for ultimate performance
- ACA perf profiling for aca/ovs interaction improvement
- Flow table installation latency improvement
- Driver communication layer that exchanges commands and events with ovs
- lib-fluid library (for connection control) importing and wrapper migration
- openvswitch library (for flow control) importing and wrapper migration
- integration with existing ovs control based on vconn
- support of normal flow operations (add/mod/del etc.)
- support of advanced feature of bundling (requires OF1.4 and above)
- integration with on-demand-engine monitor and packet-out mechanism for dhcp and arp (on-demand)
- Integration with Alcor test pipeline
- TC testing
- Kolla-ansible testing
- Jenkins testing
- Coding style alignment across ACA codes
New Features Development
- Feature E2E Integration
- Routing rule update new scenario E2E
- RM supports routing rule update
- DPM v2.0 supports routing rule update
- Support OVS programming of routing rules in ACA
- Routing rule scenario tests
- Network-level routing table update
- Subnet-level routing table update
- Scenario 1: Step 1 Create subnet/port/router, Step 2 attach Subnet to router, Step 3 Add routing rule
- Scenario 2: Step 1 Create subnet/router and attach subnet with router, Step 2 create port, Step 3 add routing rule
- Scenario 3: Step 1 Create subnet/router and attach subnet with router, Step 2 add routing rule, Step 3 create port
- Prototype per-VM multi ports scenarios
- Routing rule update new scenario E2E
Alcor Fundamental
- Alcor DevOps and CI/CD enhancement
- Support ACA build and deployment in Jenkins pipeline
- Alcor busybox pingtest script fix and enhancement to support all existing E2E scenarios.
- Cover Alcor API GW in the busybox ping test
- AWS Jenkins environment hot fix
- Fix bugs
University Collaboration (stretch goal)
- VPC-based implementation for Message Queue scale path (Min Chen/Luyao) - ETA 8/15 e2e func, start perf testing
- Scalability test framework for 1M nodes regions and 100K ports VPC (Jiawei/Min Chen) - ETA 8/30
- ML-based on-demand programming (Yan Yu/Shuang Liang) - Req. cleared, ETA 8/4 for related works