RCv3 recap - rackerlabs/otter GitHub Wiki

How did we get there?

RackConnect v3

Priority interrupt

Deadlines

  • RCv3: right now
  • Convergence: done Soon(TM)
  • Hence: have to do RCv3 in job code!

Idea

  • Implement RCv3 in convergence’s terms
  • not more of the same dead-end job code

Convergence’s terms?

What does that even mean?

Convergence: theory

  • Remove single-job orientation
  • Sounds simple, but biggest change so far

Convergence: practice

  • Many shiny new things
  • e.g.: IStep, Request, pure HTTP
  • Great ideas! But not strictly convergence

Idea (“convergence’s terms”)

  • Implement feature in terms of IStep et al.
  • Use that in the old code

Why?

  • Convergence is a huge change no matter what
  • Breaking it up into small pieces is desirable
  • Non-essential parts can be done incrementally
  • Find needed features/warts while they’re easy to fix

To rephrase:

Put lots of new machinery into prod at the same time

or

Introduce pieces gradually, then just convergence logic

Latter is clearly desirable!

Why can we do it incrementally?

  • Requirements really quite similar (if not identical)
  • Add a server, add it to a load balancer…

What did we do?

First things first

  • Thank you, Ying!
  • Tons of reviews and deploys
  • A+++ would pair again

What is the real accomplishment?

  • Business perspective: clearly RCv3!
  • Technical perspective:
    • Convergence integration path
    • Kicking the tires on a lot of new stuff
    • Lots of cleanups; make next steps easier

Did we do too much?

  • Probably not; CLB-only assumption ran pretty deep
  • Lots of untested behavior: try to stick to what’s there
  • Contingency plan always at the ready
    • “Just get it done with treq”
    • Making the request isn’t the hard part
    • Getting right args at right call sites is

Dress for the API you want, not the API you have

  • API we have: tokens, endpoints, direct treq calls
  • API we want: IStep, Request, pure_http + effect

Shim stage 1

  • Pass a ”request_func
  • Actually just lambda *a, **kw: None
  • Has old values (auth token…) as attrs
  • Consolidates auth stuff on 1 object

Shim stage 2

  • lambda *a, **kw: None -> real request_func
  • Keep arguments for backwards compatibility

What do we do next?

RCv3 as a proof of concept

Suggestion: keep going :-)

Shim stage 3

  • Gradually replace direct treq code in worker

Shim stage 4

  • Remove old argument crutches
  • Support most (all?) operations for convergence in worker code

Won’t this mean tons of work?

  • Not really; API calls have to work anyway
  • Worst part: thread request_func through calls
  • Tricky part: work around poor tests
⚠️ **GitHub.com Fallback** ⚠️