OSPF Graceful Restart (RFC 3623) - GalaxyGorilla/frr GitHub Wiki

Graceful Restart (GR) for OSPF provides a way for operators to restart the OSPF control plane (in case of FRR this is ospfd) without interrupting the forwarding plane. In a nutshell this is done by announcing a restart using so called 'grace' LSAs to OSPF neighbors which then are supposed to act in 'helper mode' on behalf of the restarting router by keeping routes unchanged (as long as there is no change in topology) and handling other LSAs properly. When the restarting router starts again it will collect the previous link state information through LSAs from the neighbors. The overall GR process takes place within a certain time frame which is called the 'grace interval', which is also announced through mentioned 'grace' LSAs to the neighbors.

The OSPF neighbors obviously need to be GR capable to follow this procedure. However, OSPF GR is designed with the intention in mind that the node in question is restarted anyway, regardless of the neighbor capabilities. Hence there is no mechanism or feedback to guarantee that a restart is only executed when the environment provides the desired GR capabilities.

Comparison to BGP GR

Note that OSPF GR is somewhat different from BGP GR when it comes to announcing the actual restart. BGP GR is announced with capability negotiation and then the neighbors just assume a GR when the TCP session to the restarting node terminates. There is no explicit exchange of information about the restart. Further OSPF has a need for non-volatile local storage to keep the GR 'grace interval' timer information which is then fetched again after restart. In comparison BGP GR is stateless.

Proposed mode of operation for gracefully restarting FRR

The following process resembles BGP GR as implemented in FRR.

Trigger OSPF GR through CLI/NB (this floods the 'grace' LSAs and stales the RIB)
Kill the ospfd daemon (literally pkill ospfd, no CLI/NB interaction)
Start the ospfd daemon again

Since OSPF GR stores state in non-volatile memory it is important that this state is shared in HA scenarios. Also there is a dependency to failure detection technologies like the BFD protocol which needs to be considered. To date FRR has no administrative entity on top of the daemons to manage those problems (in opposite to the large vendors which sell complete HA solutions).

Proposed compromises for a first implementation

For the sake of having an initial somewhat testable and working solution in place for FRR the following compromises are dealt with:

Use a CLI command on the router level to trigger GR
Use zebra for temporarily storing the 'grace interval' timer information
Don't care about HA and failure detection
Only handle 'planned' restart and no unplanned outages (due to e.g. a daemon crash), see 5. in RFC 3623
Don't take in any Traffic Engineering considerations, see 6. in RFC 3623

Those compromises allow us to easily come up with tests and to integrate those into the CI such that the core OSPF GR logic can be sufficiently tested without caring too much about operational issues. Later on there is little overhead in removing those compromises for a potentially more mature solution.

It is not clear yet how to store the transient information (grace time) properly. This must also be compliant with the way FRR is restarted s a whole, e.g. also zebra is involved in such a restart.

Implementation items

Pre-restart (restarting node)

Provide means for configuring OSPF GR on router level (enable GR, trigger GR)
Couple the CLI infrastructure to the opaque LSA 'framework'
Stale the RIB using zclient/zapi infrastructure
Provide means for creating and installing interface dependent 'grace' LSAs
Flood 'grace' LSAs to OSPF neighbors
Provide means for parsing 'grace' LSAs (for the neighbors, might be done as part of the helper stuff)
Store 'grace interval' timer information in zebra using zebra's GR capabilities
Optional: Store cryptographic sequence numbers in zebra (this needs a bit more research, probably won't be done)

Post-restart (restarting node)

Provide means to restore the 'grace interval' timer information from zebra and enter 'GR mode'
Don't originate LSAs of type 1-5,7
Don't modify or flush received self-originated LSAs
Don't install any routes into the RIB
Recover designated router election
Recover OSPF adjacencies from neighbor LSAs (and exit GR)
Consistency check for received LSAs (and optionally leave GR)
Various GR exit actions (flush 'grace' LSAs, originate certain LSAs etc, see 2.3 in RFC 3623)

Helper mode (neighbour node)

Provide means for CLI configuration (enable/disable helper mode)
Provide means to receive and parse 'grace' LSAs
Enter GR helper mode when 'grace' LSA is received (this involves a series of checks, see 3.1 in RFC 3623)
Continue to advertise LSAs as if the restarting router remained in continuous OSPF operation
Exit GR helper mode (this involves a series of checks, see 3.2 in RFC 3623)
Recover designated router election on GR exit
Originate certain LSAs on GR exit (see end of 3.2 in RFC 3623)