DevOps - henk52/knowledgesharing GitHub Wiki

Introduction

Purpose

References

Vocabulary

  • Automation: allows you to make successful processes repeatable(pub17b,4).
  • CAMS: culture, automation, measurement and sharing.
  • Continuous Delivery(CD): a set of practices that ensure code can be deployed to production at any time(pup17b,6)
  • Continuous Integration(CI): the practice of integrating and testing new code against the existing code base with every change, and it’s a necessary part of the continuous delivery process(pup17b,6)
  • Continuous Deployment: automatically deploying code that has successfully passed through the testing stage(pup17b,6)
    • Continuous deployment is the ultimate version of continuous delivery, in which every change that makes it through automated tests is automatically deployed to production(pub17b,10)
  • GitOps - x as code + push or pull the changesWhat is GitOps, How GitOps works and Why it's so useful.
  • IDP - Internal Development Platform. ()
  • Platform Engineering - What is Platform Engineering and how it fits into DevOps and Cloud world. Takes care of setting up the DevOps servers and provides templates etc.
  • pull-based architecture: prevents you from inadvertently passing code that fails automated tests to the next stage of development(pub17b,4).
  • SRE - Site Reliability Engineering
    • Complimentary to DevOps
    • Same DevOps principles
    • More focused on reliability and keepping system stable

Overview

Continuous delivery is not a thing — it’s a process. Getting to where you’re doing continuous delivery is itself a process. That’s because it requires changes to tooling, to processes, and most important, to how people work together, and who works together(pub17b,11).

Release software fast with quality.

  • Dev
    • Plan
    • Code
    • build
    • test
  • Ops
    • release
    • deploy
    • operate
    • monitor
      • postmortem
  • Loop back to start

Purpose of DevOps

  • focus on getting that feedback loop as short as possible so we can actually detect correlations, and discern cause and effect(pub17b,9)

DevOps concepts

  • Planning tools
    • jira
    • wekan
  • Code repository
  • Infrastructure
    • On-prem
    • cloud providers
      • Azure
      • aws
      • google cloud
    • deployment automation
      • foreman
      • terraform
      • ansible
  • Networking and security
    • Know this to the extent of being able to prepare the servers to run the application
      • but not to completely take over managing these servers.
    • Firewall, proxy servers
    • load balancers
    • http/https
    • IP, DNS Name resolution
  • Containers
    • Virtualization
    • Containers
      • Docker
  • Telemetry
    • logs
      • fluent-bit
    • metrics
      • prometheus
    • trace
      • open telemetry
  • Build automation CI/CD
  • Container orchestration
    • k8s
      • grafana
      • loki
      • Prometheus
      • traefik proxy
      • cert manager
  • Monitoring
    • monitor software
    • monitor infrastructure
    • prometheus
    • (nagios)
  • Infrastructure as code
    • infrastructure provisioning
      • terraform
    • configuration management
      • ansible
      • chef
      • puppet
  • Scripting language
    • Bash
    • Powershell
    • python / Ruby / GOlang

security steps

  • SCA SAST
  • DAST
  • Security in IaC - Infrastructure as Code?

The pillars of DevOps

  • configuration management lets you make sure the development, testing and production environments are closely matched, so any errors that new code could cause in production are discovered — and corrected — long before deployment(pub17b,7).

  • Continuous Integration: Deployment becomes much less stressful when changes are small and tested at every step. And if you need to, it’s easier to roll back changes to your code, changes to the environment, or more importantly, both together(pub17b,8).

  • Notes

    • If there are errors, each deployment should be a small enough change that it’s easy to roll back to the last known good state(pub17b,11).

Learning path

  • DevOps boot camp

  • Code repository

    • Version control with git
  • Build and package management

    • Languages
      • Rust
      • Javascript
      • Python
      • Java
    • package app
    • run test
  • Containers

    • Docker
  • Infrastructure

    • On-prem
    • cloud providers
      • Azure
      • aws
      • google cloud
  • Container orchestration

    • minikube
    • k8s deployment
    • k8s on KVM
    • AWS - EKS
    • Azure k8s
    • OpenShift
  • Artifact repository

    • Nexus
    • dockerhub
    • gitlab?
  • Build automation CI/CD

    • Jenkins
    • github
    • gitlab?
  • Infrastructure as code

    • infrastructure provisioning
      • terraform
    • configuration management
      • ansible
    • Helm
    • flux/aegro?
  • Telemetry

    • logs
      • fluent-bit
    • metrics
      • prometheus
    • Trace
      • jaeger
  • Monitoring

    • monitor software
    • monitor infrastructure
    • prometheus
    • (nagios)
  • Networking and security

    • Know this to the extent of being able to prepare the servers to run the application
      • but not to completely take over managing these servers.
    • Firewall, proxy servers
    • load balancers
    • http/https
    • IP, DNS Name resolution
  • HA

    • ChaosMonkey
  • Scripting language

    • Bash
    • Powershell
    • python / Ruby / GOlang

learning Telemetry

telemetry - fluentbit - logs

Introducing DevOps into a new team

    1. The first real step is to follow your build process and write it all down(pub17b,12).
    1. Identify what to automate first:
    • Which steps take the most time?
    • Which steps are the most error-prone and/or require the most human intervention?

Introducing Platform engineering

  • Start with providing something at least one team needs right now(e.g. k8s cluster)
  • Identify common tools used How to implement IDP succesfully
    • Look at what tools each team is using, these tools could then be the first tools offered as a service
    • You need to work closely with the app teams
    • Teams are happy to work with you if they see you are solving an issue or removing a bottleneck.
    • Don't star by e.g. forcing them to move to a new CI/CD tool, that you want to standardize. You are then adding to their workload.

[I do wonder, is this following the DevOps summary papers sugestions on rolling out devops?]

  • v1.0 - taking load off developers

    • Prove that you make their work easier.
  • v2.0 - work on consistency

  • Teams Use pre-configured services via the IDP

  • Best practices are then backed into the services

State of Devops reports summaries

Notes from 2018 - State of devops

  • Stage-0: Foundation

    • Deployment patterns for building applications and services are reused
    • Monitoring and alerting are configurable by the team operating the service.
      • Empowered teams that run applications and services in production can define what a good service is; how to determine whether it’s operating properly; and how they’ll find out when it’s not.
    • Deployment patterns for building applications or services are reused.
    • Testing patterns for building applications or services are reused.
    • Teams contribute improvements to tooling provided by other teams.
    • Configurations are managed by a configuration management tool.
    • Rearchitect applications based on business needs.
  • Stage-1: Common tech stacks -> Reduce complexity

    • Teams deploy on a standard set of operating systems.
    • Build on a standard set of technology.
      • standardizing with an eye to what is optimal for all applications, not just a few applications. Use proven technologies and reliable processes for what goes into production, and provide clear processes and guidelines for adding any new technology to enable product incubations, research and experimentation.
      • While standardizing the tech stack provides clear business benefits, rigidly adhering to standards can put a damper on learning and innovation. The key is to regularly revisit standards and build in exceptions for innovation and experimentation.
    • Put application configurations in version control.
      • Separating data from code is low-hanging fruit, and makes sense in these early stages. It also builds the foundation for automated deployment. With app configurations in version control, you can track who makes what changes, and roll back changes as needed.
        • for example, etcd, ZooKeeper, and Consul.
    • Test infrastructure changes before deploying to production.
      • also provides the foundation for creating reusable deployment patterns, which you can’t do unless you have a standard way of testing changes.
  • Stage-2: Standardize and reduce variability

    • Build on a standard set of technology.
      • The variation could be caused by
        • Adoption of new technologies to replace many functions of older technologies; yet the older technologies never actually get removed.
        • Homegrown products that don’t follow any common industry standards and lack common interfaces.
        • A proliferation of tools that overlap and haven’t been rationalized.
      • A primary anti-pattern to watch for at this stage is each team normalizing on its own standards. This will lead to a greater degree of global variance, and is exactly the wrong direction. p49
      • %T% Work towards all teams use the same tools or concepts to deploy the apps.
      • standardize on proven technologies, optimizing for the 80 percent cases and your global use cases. This can be done only in collaboration with other teams. p49
      • The main benefit in this stage is reducing variables and therefore complexity, buying time for further investments in collaboration, automation, sharing, and metrics in subsequent stages.
      • The number of variables in any process or system is directly proportional to its complexity. With fewer variables in play, it is easier to execute a process. And with fewer variables, you can also isolate them, modify them and measure the impact of each change. Next you reduce the variables to optimize flow. Then you make changes in those variables to further optimize output.
      • Start by choosing foundational elements to normalize on — for example, you could select a single relational database management system and a single key value store. p50
        • You can also reduce variables by normalizing your testing workflows, build, and shipping patterns. p50
      • Ideally, teams driving better understanding of their problem domain are innovating, and with technology where warranted.
      • There should be a lower barrier to trying something, but the barrier should rise significantly when it comes to introducing a new piece of technology into a production lifecycle. p50
      • The key benefits of standardizing a team’s patterns and technologies are: p50
        • Faster delivery velocity.
        • More flexibility for development staff to work on different applications, services or components.
        • Reduced surface area for security vulnerabilities.
        • Fewer moving parts to maintain, upgrade and learn.
      • Organizations can move faster when a single operating system, or a small set of operating systems, is the standard. You save time on patching, tuning, upgrading and troubleshooting when there’s just one OS or at least a very small number in use. p51
      • Beyond operating system standardization is the rest of the technology stack.
        • The owners and choosers of the technologies in play here can vary.
        • Standardizing across many teams on technology choices like database systems, message queues, logging aggregation utilities, monitoring/metrics instrumentation and collection, and key value stores allows for any lessons learned in supporting and maintaining those tools to be reapplied to other applications and teams.
    • Put system configurations in version control.
      • Keeping system configurations in version control is also one of the first steps to adopting software development practices for infrastructure. This in turn is key to automated infrastructure delivery, and a building block toward infrastructure as code. p53
    • Teams deploy on a single standard operating system.
    • Deployment patterns for building applications and services are reused.
      • stronger case for unified deployment process flow, tools and patterns. Failures can be investigated and managed uniformly across different services, so the teams responsible for deployment are less likely to have to go back to service authors when a deployment fails. p57
      • In organizations where deployment patterns are truly mastered, multiple applications use the same pipelines and jobs for deployment; only the application name and possibly a few other parameters are fed to the job as configuration. p57
      • With deployments standardized and reused to this degree, any optimization to the deployment job or pipeline is immediately consumed by all applications, so the benefits multiply quickly. p57
      • When each team invents its own deployment patterns, that limits agility, and the team doesn’t have time to spend on truly differentiating work.
      • This also makes it harder for developers and infrastructure engineers to move between teams, which further limits agility (and, by the way, makes it harder for your people to grow and develop at your organization, threatening retention).
    • The primary goal of architecture changes is to support standardization and align with its goals — greater velocity and easier maintainability. p53
  • Stage 3: Expand DevOps practices

    • Infrastructure changes are tested before deploying to production.
      • while some lend themselves to automation with a reasonable amount of effort, other changes are just too infrequent or expensive to validate in an automated fashion. p58
      • So don’t get too locked into the method — just make sure that you validate infrastructure changes prior to a production deployment.
      • For example, when replacing core network switches in a data center, the engineers should be sure they understand the new switch, have tested its capabilities, have a deployment plan, and know they must validate functionality. p58
    • Individuals can do work without manual approval outside the team
      • Empowering teams and individuals certainly supports the spirit of a DevOps evolution, in addition to getting work done more quickly. p55
      • When someone can get work done with minimal handoffs, approvals and wait time, they’re happier and more productive. p55
      • Mark it harder to fail.
    • Individuals can make changes without significant wait times.
      • It’s helpful to look at the reasons for each wait and ask what would have to change in order to eliminate it. p59
      • When processes are simpler and consistent, they’re also easier to automate, which comes in handy as organizations progress toward self-service.p59
    • Service changes can be made during business hours.
      • Some organizations do maintenance only during business hours, making use of canary deployments, blue/green deployments or active/passive sides of an application. p60
        • These architecture and deployment patterns optimize for rolling change through the system often, and allow for a relatively easy backout plan if a change goes awry. p60
      • you need to demonstrate success in making changes reliably so the business partners and stakeholders of your service trust your abilities. p60
    • Post-incident reviews occur and results are shared.
      • Post-incident reviews are a blameless look back at what happened during an incident, how it happened, and what improvements could be made to shorten the duration of the incident, improve the understanding of the systems behind the incident, and prevent it from happening again.
      • Improvements from a well-run post-incident review can include revisiting and simplifying processes; updating communication patterns; and working from a position of empathy with other stakeholders of the application or service.
      • Once a post-incident review is done, share the results. People who were not directly involved may be able to learn something. They may spot a flaw in an adjacent process. p61
      • Some organizations share results with their customers publicly, while others make them available to internal customers and stakeholders. The more you share, the more collaboration and trust you’ll foster. p61
    • Teams build on a standard set of technologies.
      • Some organizations begin by standardizing on entry points for deployment — for example, to deploy any application, you type ./deploy <environment>. p56
      • standardizing on technologies is an ongoing effort, not a single moment in time. p61
    • Teams use continuous integration.
      • The important things to optimize for are feedback cycle time and correctness. p61
        • Correctness also matters, so CI systems require maintenance, adjustments and improvement over time. p61
        • For example, if you add a new operating system or browser to your support matrix, all relevant jobs should be able to pick it up.
      • When feedback cycle times are short, more iterations can occur, and so quality improves. p61
        • it may make sense to run only fast tests during working hours, and wait to run slower tests at night or during a weekly window when feedback cycle time is not as critical. p61
    • Infrastructure teams use version control.
      • The use of version control by infrastructure teams has a significant impact on Stage 3 of DevOps evolution, and is also an associated practice for Stage 4. p61
      • Use of version control makes it easy to recreate environments for testing and troubleshooting, boosting throughput for both Dev and Ops. p66
        • It also reduces the time to recover if an error is identified in production. p66
  • Stage 4: Automate infrastructure delivery - the objective driving infrastructure automation at this stage is to provide greater agility to the entire business, not just for a single team.

    • About this stage:
      • It often begins with teams automating for their own needs, and then begins to align with the business.
      • infrastructure automation develops to provide uniform capabilities and services for technology delivery.
      • The goal is to provide more reliable services and capabilities through a formal automation pipeline and workflow that couple with the services and applications built on that infrastructure. p63
    • System configurations are automated. p64
      • You need control over your infrastructure layer in order to achieve agility with the applications and services running on top of it. p64
      • Once you can repeatably deal with account creation/removal, load balancer configuration changes, security patches and monitoring policy updates, you’re no longer being held back by infrastructure that lags behind changing business and application demands. p64
      • Configurations for systems are normally built or rendered from a source of truth (version control) using an automation framework. p64
        • either: automating all change, giving them completely repeatable, rebuildable systems
        • or: automate the most common tasks – where the return on investment is easy for other teams and management to see; leaving the complicated or infrequent changes to be dealt with in a more ad-hoc manner.
      • Benefits
        • Overall speed: Automated tasks should be faster than manually completed tasks.
        • Consistency: Automated tasks follow a set process and thus produce predictable results.
        • Documented behavior: Tasks now have a defined way they are supposed to work, so are easier to troubleshoot.
        • Portability: With the right automation framework, teams can use content written by others to improve velocity and maintenance of their automation library.
      • When you begin automating infrastructure, automate items you run into with the highest frequency across the widest swath of infrastructure components.
        • This will have a big impact, free up your own time in meaningful ways, and buy you time to work on more complex automations.
    • Provisioning is automated.
      • Instead of treating each service request as a one-off, operations teams develop and offer a menu of standardized services aligned with business objectives. p65
      • Provisioning can be the automatic creation of a resource of nearly any type. p65
        • Most often, teams use the word when they’re talking about OS instances, network connectivity, storage, and accounts.
      • As with system configurations, it’s best to begin with the most frequently requested item; gain some wins, consistency and time savings; and then move onto the next most frequent request
      • As with most steps in the DevOps evolution, you want to choose tasks that will win the confidence — even gratitude — of others both inside and outside your team. p65
      • Application configurations are in version control
        • Application configurations should be versioned, auditable, contain history, and ideally, the reasons why they’ve been changed. p66
    • Security policy configurations are automated.
      • The best way to adhere to security policy is to know whether you’re compliant, and fix systems when you’re non-compliant. p67
      • As security policy automation gets a bit more mature, the use of configuration management systems emerges. Configuration management enables policy to be enforced upon system convergence, and reports to be handled in a standard way. p67
      • Some teams may run static analysis on code via their continuous integration pipelines. p67
    • Resources made available via self-service. pipelines. p73
  • Stage 5: Provide self-service capabilities

    • About
      • Application architecture moves beyond standardizing on technologies and begins to evolve towards working with and supporting cloud migration, container adoption, and proliferating microservices. p69
      • Security policy automation moves from servicing the needs of a team to becoming the baseline for how security and compliance are measured throughout a department, or even the entire organization. p69
      • Additionally, automated provisioning advances to provisioning of whole environments for developers, testers and other technical staff. p69
    • Incident responses are automated.
      • All of this means there’s a huge amount of value to be gained by automating incident response. p70
      • Automating eliminates unnecessary distractions, improves time to remediation by reducing handoffs, and ensures that your remediation processes are consistently applied.
      • think about your automation as being there to augment human judgement. p70
      • Focus on the processes and systems that let you identify issues, as well as those you deploy when responding. p70
      • Make it simple for your operators to get to whatever data they need to form a judgement,
        • and once they’ve done so, automate response processes — things like
          • adding a malicious IP to all your firewalls across your infrastructure;
          • collating data for later forensics;
          • or completely isolating an infected machine.
    • Application developers deploy testing environments on their own.
      • Teams should build self-service systems for themselves and then their adjacent teams, next expanding outwards through the organization. p71
        • This is exactly what the data shows successful teams do.
    • Success metrics for projects are visible.
    • Provisioning is automated.
    • Stage 5, rearchitecting applications for the business means more fundamental surgery is performed: adopting the 12-factor app methodology, moving to microservices, adopting containers or replacing components with cloud services. p72
    • Security teams are involved in technology design and deployment
      • Testing security requirements as a part of the automated testing process. p72
      • Creating pre-approved, easy-to-consume libraries, packages, toolchains and processes for developers and IT operations to use in their work. p72
    • Contributors to succes in Stage5
      • if you don’t put the work in to customize catalog items for your business, you won’t see the truly significant gains that are possible. p73
      • Whether teams are building something themselves or deploying an off-the-shelf self-service catalog, it’s critical that the platform can truly operate as an underlying substrata for other solutions such as CI/CD
    • Success metrics for projects are visible p75
      • (number of builds done, number of failed, how long was the pipeline down after a break)
      • Once success metrics are clearly defined and visible to everyone, you'll find it's far easier to get agreement on what needs to be addressed next for the health of the business. p75
  • One or more teams automate a few key things; they reclaim time that used to be spent putting out fires; and they invest that time in further improvements, which helps to build momentum and support for change within their team.

  • automated measurement of business objectives.

  • focus on delivering business value rather than just technology.

  • Our discovery that all the fundamental practices enable or rely on sharing tells us that the key to scaling DevOps success is adoption of practices that promote sharing.

    • It makes sense: When people see something that’s going well, they want to replicate that success, and of course people want to share their successes
  • collaboration and sharing across team boundaries. That sharing is critical to defining the problems an organization faces and coming up with solutions that work for all teams.(o44 of devops 2018)

  • IT capabilities as a service to the business, rather than treating IT as a cost center that executes work orders. p68

2017 - State of devops

Planning

Planning tools

Testing

junit format

<?xml version="1.0" ?>
<testsuites>
    <testsuite errors="0" failures="0" name="first test suite" tests="1">
        <testcase classname="some.class.name" name="Test1" time="123.456000">
            <system-out>
                standard out of the test run
            </system-out>
            <system-err>
                standard error of the test run
            </system-err>
        </testcase>
    </testsuite>
</testsuites>

Cucumber testing framework

Telemetry

GitOps

Push Pull
  • Push
    • Pros
      • Easy to setup/understand
      • Works with most tools/infra setups
    • Cons
      • Not as secure as Pull based model
        • ports have to be opened
        • CI tool has to be given access to the production environment
          • (TODO how is this different to the Pull model getting access???)
  • Pull
    • Pros
      • Fast/effecient
        • (TODO how is this faster than push? Push starts emediatly where pull runs on an interval)
      • Tighter integration
        • (TODO how?)
      • More secure than the Push model
        • (TODO How?)
        • Dont need to open the firewall
          • (TODO what about the network path to do the pull)
        • Password is not needed externally (TODO can you 'k exec' the argo cd container?)
    • Cons
      • Limited to k8s

ArgoCD

  • Continuis delivery tool.
  • Compares the actual state with the (desired) git state and applies the git state ArgoCD Tutorial for Beginners | GitOps CD for Kubernetes
    • Argo CD can also send an alert instead of makeing the change.
  • Uses existing k8s functionality
    • using etcd to store data
    • using k8s controllers for monitoring and comparing actual and desired state.
  • Defines an 'Application' CRD that bind a git repo to a k8s server and namespace
    • The cluster can both be the cluster ArgoCD runs in or an external cluster.
    • You can define multiple applications
    • You can group multiple applications in another CRD called AppProject
  • I a multicluster comprising Dev, Stagem Prod, you could have an Argo CD running in each cluster.
  • I a multicluster where each cluster runs in a different datacenter you might have a single ArgoCD orchestrating in all the clusters.
  • How do we test the configuration update

Installing Argo CD

Deploying applications to argo CD

  • Make sure thar you have pushed your changes to your git repo.
  • Then the First time do kubectl apply -f application.yaml
  • after this it is enough to push new changes to the git repo.

Argo CD application manual update

  • click refresh
    • compare the latest code in the git repo with the live state
  • click sync
    • Move to target state, by actually applying the changes to the k8s cluster

Troubleshooting Argo CD

Images version change is not picked up

and the service was not shown in the Argo CD webui.

it turns out I had copied and pasted the deployment into both the deployment.yaml file and the service.yaml file.

It also showed a yellow waning about the app resource but I didn't understand what it meant.

Internal Developer Platform(IDP)

  graph TD;
      User---GUI;
      User---GIT;
      GUI---A[Interface\nControl plane];
      GIT---A;
      Secrets---A;
      Schemas---A;
      Pipelines---A;
      Pipelines---Images;
      A---elsewhere;
      A---AWS;
      A---Azure;
      A---Google;
      A---DB1[DB];
      A---DB2[DB];
      GUI---GIT;
Loading

Control plane

  • Crossplane

sync from git with gitops

  • ArgoCD
  • or flux

IDP tools

Argo CD

Let's do GitOps in Kubernetes! ArgoCD Tutorial

Crossplane

Install the crossplane cli

Install crossplane in your k8s cluster

  • Install Crossplane

  • helm repo add crossplane-stable https://charts.crossplane.io/stable

  • helm repo update

  • helm install crossplane --namespace crossplane-system --create-namespace crossplane-stable/crossplane

  • kubectl get pods -n crossplane-system

  • kubectl api-resources | grep crossplane

Give crossplane access to AWS

  • create a user in AWS and create credentials for that user
  • get the key and secret
  • save the key and secret in an aws-credentials.txt file
  • kubectl create secret generic aws-secret -n crossplane-system --from-file=creds=$HOME/tmp/aws-credentials.txt
  • kubectl describe secret aws-secret
  • create the provider.yaml file
  • kubectl apply -f provider.yaml
  • vi s3bucket.yaml
  • kubectl apply -f s3bucket.yaml
  • watch kubectl get bucket
  • kubectl describe bucket
  • look around
  • kubectl delete -f s3bucket.yaml

aws-credentials.txt

[default]
aws_access_key_id = xxx
aws_secret_access_key = xxxx

provider.yaml (There ProviderConfig is dependent on Provider, so you might need to split the two into seperate files and call Provide creation first.)

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws-s3
spec:
  package: xpkg.upbound.io/upbound/provider-aws-s3:v0.42.0
---
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-secret
      key: creds

Backstage

  • A First Look at Backstage.io

  • Backstage guid articles

  • How To Build A UI For An Internal Developer Platform (IDP) With Port?

  • Port & ArgoCD: Building a Unified Developer Experience - Part 3/3

  • Building developer portals with Backstage

  • Developer Portals: Building Backstage templates and plugins

  • run the javascript docker image

  • curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

  • export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"

  • [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"

  • curl -o- -L https://yarnpkg.com/install.sh | bash -s -- --version 1.22.22

    • to install the version that fits with Node.js version 18
  • export PATH="$HOME/.yarn/bin:$HOME/.config/yarn/global/node_modules/.bin:$PATH"

  • nvm install 18

  • nvm use 18

  • npx @backstage/create-app@latest

    • y
    • my-backstage-app
  • cd my-backstage-app

  • vi app-config.yaml

    • change baseUrl: http://localhost:3000
    • to baseUrl: http://0.0.0.0:3000
  • Follow the authentication process:

  • yarn upgrade

  • plugin-catalog-backend-module-github-org

    • also try what is on
  • Update app-config.yaml

  • Update packages/app/src/App.tsx

  • Update packages/backend/src/index.ts

  • yarn add --cwd packages/backend @backstage/integration

    • has some gyp errors.
  • yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github

  • yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github-org

  • yarn install

  • yarn dev

  • Follow the github thing

  • Update app-config.yaml

  • Update packages/app/src/App.tsx

  • Update packages/backend/src/index.ts

  • Update examples/org.yaml to change "guest" to my GH username.

TODO what tool can I use for log agregation?

gitlab as auth for backstage

  • start the gitlab server
  • click the 'Admin Area' button at the bottom of the left bar
  • Click 'Applications'
  • Click 'Add new application'
  • follow description in GitLab Authentication Provider

Failed to sign-in, unable to resolve user identity

Failed to sign-in, unable to resolve user identity

Login failed, user profile does not contain an email

Login failed, user profile does not contain an email

Origin http://172.17.0.2:3000 is not allowed

Auth fails with "Login failed; caused by NotAllowedError: Origin '...' is not allowed"

change the app.baseUrl in app-config.yaml to the IP address the browser is using to access the website (TODO how is this handled when it is running through e.g. nginx???)

app:
  title: Scaffolded Backstage App
  baseUrl: http://172.17.0.2:3000
Sign in using GitHub xxx

Origin 'http://172.17.0.2:3000' is not allowed
[1] 2024-06-01T21:35:30.713Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:30 +0000] "GET /api/auth/github/start?scope=read%3Auser&origin=http%3A%2F%2F172.17.0.2%3A3000&flow=popup&env=development HTTP/1.1" 302 0 "http://172.17.0.2:3000/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest
[1] 2024-06-01T21:35:31.075Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:31 +0000] "GET /api/auth/github/handler/frame?code=6854f0903f3fda0b52ab&state=6e6f6e63653d7a434e35716a326b724f4261646d25324253475a6437717725334425334426656e763d646576656c6f706d656e74266f726967696e3d687474702533412532462532463137322e31372e302e322533413330303026666c6f773d706f7075702673636f70653d7265616425334175736572 HTTP/1.1" 200 - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest
[1] 2024-06-01T21:35:31.100Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:31 +0000] "GET /favicon.ico HTTP/1.1" - - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest

Error: Cannot find module '@backstage/plugin-catalog-backend-module-github-org'

  • yarn why @backstage/plugin-catalog-backend-module-github-org
    • response: error We couldn't find a match!
  • yarn add --cwd packages/backend @backstage/integration
    • has some gyp errors.
  • yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github
  • yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github-org
[0] <i> [webpack-dev-server] 404s will fallback to '/index.html'
[1] node:internal/modules/cjs/loader:1140
[1]   const err = new Error(message);
[1]               ^
[1] 
[1] Error: Cannot find module '@backstage/plugin-catalog-backend-module-github-org'
[1] Require stack:
[1] - /home/devenv/my-backstage-app/packages/backend/src/index.ts
[1]     at Module._resolveFilename (node:internal/modules/cjs/loader:1140:15)
[1]     at Module._load (node:internal/modules/cjs/loader:981:27)
[1]     at Module.require (node:internal/modules/cjs/loader:1231:19)
[1]     at require (node:internal/modules/helpers:177:18)
[1]     at <anonymous> (/home/devenv/my-backstage-app/packages/backend/src/index.ts:25:13)
[1]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
[1]   code: 'MODULE_NOT_FOUND',
[1]   requireStack: [ '/home/devenv/my-backstage-app/packages/backend/src/index.ts' ]
[1] }
[1] 
[1] Node.js v18.20.3

Unknown auth provider github

{
  "error": {
    "name": "NotFoundError",
    "message": "Unknown auth provider github",
    "stack": "NotFoundError: Unknown auth provider github\n
        at <anonymous> (/home/devenv/my-backstage-app/node_modules/@backstage/plugin-auth-backend/src/service/router.ts:164:11)\n
        at handleReturn (/home/devenv/my-backstage-app/node_modules/express-promise-router/lib/express-promise-router.js:24:23)\n
        at /home/devenv/my-backstage-app/node_modules/express-promise-router/lib/express-promise-router.js:64:7\n
        at Layer.handle [as handle_request] (/home/devenv/my-backstage-app/node_modules/express/lib/router/layer.js:95:5)\n
        at trim_prefix (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:328:13)\n
        at /home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:286:9\n
        at param (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:365:14)\n
        at param (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:376:14)\n
        at Function.process_params (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:421:3)\n
        at next (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:280:10)"
  },
  "request": {
    "method": "GET",
    "url": "/api/auth/github/start?scope=read%3Auser&origin=http%3A%2F%2F172.17.0.2%3A3000&flow=popup&env=development"
  },
  "response": {
    "statusCode": 404
  }
}

alpine seems to not work

  • apk add py3-setuptools
  • apk add build-base
  • apk add musl-dev
  • dev86
  • apk add linux-headers
  • apk add --force-refresh coreutils

Scratchpad

Splitting Source code and App configuration into two separate repos

Debugging clusters

  • Prompt: Help Me Debug a Cluster! - Anusha Ragunathan & Lili Wan, Intuit Inc

  • cluster golden signals (7.14), derived from "7 golden signals" ?

    • Four pilars of
      • errors
      • saturation
      • latency
      • traffic
    • Cluster golden signals is a collection of algorithms, quality metrics and dashboards that provides a single pane of glass to view the health and availability of a k8s cluster, while also providing a single "golden signal" to get alerted on. 7.42
    • identify the core critical components of a k8s cluster and bucket them by functionality
      • control plane
      • authenticaction
      • autoscaling
      • network
      • critical cluster addons
      • bootstrap addon metrics
      • AWS metrics(cloud metrics)
    • for each component: Generate a prometheus golden Signal.8.26
      • Health is generated using an algorithm that relies on:
        • error SLAs
        • statistical formulas for anomaly detection
      • Health can have one of 3 values: Healthy, Degraded or Critical
    • Generate overall health and availability of the cluster using the component golden signals and aggregation algorithm 8.47
      • A cluster can only be healthy if all components are healthy
      • if just one component is degraded then the whole cluster is degraded
      • if just one component is critical then the whole cluster is critical
    • Build dashboards to surface the metrics
    • setup alerting based on the health of the cluster
    • (A closer look at errors and how to describe them as a prometheus rule) 10.36

Reduce MTTR

Key: Identify root cause.

  • AI for platform debugging
    • k8sgpt
⚠️ **GitHub.com Fallback** ⚠️