Resilience Engineering - kimschles/schlesinger-knowledge GitHub Wiki
The Future of DevOps is Resilience Engineering
Amy Tobey at Failover Conf, 21 April 2020
Terms
Resilience Engineering
In the fields of engineering and construction, resilience is the ability to absorb or avoid damage without suffering complete failure and is an objective of design, maintenance and restoration for buildings and infrastructure, as well as communities.
- Designing systems so that they can recover from failure
Socio-technical systems
- A system created by people who leverage technology
- Example: Daft Punk
Common Ground
- When a group of people have a shared context that is communicated through shared language and rituals
- Example: a jazz combo that can create music through a combination of calling jazz standards, and applying musical keys and styles
Cognitive Capacity
- How much thinking juice you have 😁
- Spoon Theory is a way some disabled people describe cognitive capacity, and how the tasks of everyday living as a person with a disability can deplete your capacity faster than people who are able-bodied.
Joint cognitive systems
From the Flight Safety Foundation:
a system in which humans interact with machines and each other to maintain control of a safety-critical activity.
Adaptive Capacity
the ability of institutions and networks to learn, and store knowledge and experience
Resilience Engineering and DevOps
- The cause of an outage is never human error. It is the environment and system that led a human to make a decision that caused the outage
- There is no such thing as a root cause. There is such as thing as the most likely reason an outage occured.
- We must learn from successes, not just failure. Don't just do post mortems, study what happened when things are going well.