Introduction

Purpose

References

dop18: State of DevOps 2018
dop17: State of DevOps 2017
12 factors
pup18: puppet - State if Devops Report 2018
2018-state-of-devops-report-full-methodology
devops-playbook-how-start-scale-and-succeed
pup17b: Continuous Delivery: What It Is and How to Get Started
pub17c: Continuous Delivery: What It Is and How to Get Started
devopsbookmarks page - web site with list if various tools related to DevOps
15 Tools Java Developers Should Use After a Major Release
What is DevOps? REALLY understand it | DevOps vs SRE
On implementing DevOps, too many tools, DevSecOps, and more with Nana Janashia
Continuous Integration using Jenkins, Nexus, Sonarqube and Slack

Vocabulary

Automation: allows you to make successful processes repeatable(pub17b,4).
CAMS: culture, automation, measurement and sharing.
Continuous Delivery(CD): a set of practices that ensure code can be deployed to production at any time(pup17b,6)
Continuous Integration(CI): the practice of integrating and testing new code against the existing code base with every change, and it’s a necessary part of the continuous delivery process(pup17b,6)
Continuous Deployment: automatically deploying code that has successfully passed through the testing stage(pup17b,6)
- Continuous deployment is the ultimate version of continuous delivery, in which every change that makes it through automated tests is automatically deployed to production(pub17b,10)
GitOps - x as code + push or pull the changesWhat is GitOps, How GitOps works and Why it's so useful.
IDP - Internal Development Platform. ()
Platform Engineering - What is Platform Engineering and how it fits into DevOps and Cloud world. Takes care of setting up the DevOps servers and provides templates etc.
pull-based architecture: prevents you from inadvertently passing code that fails automated tests to the next stage of development(pub17b,4).
SRE - Site Reliability Engineering
- Complimentary to DevOps
- Same DevOps principles
- More focused on reliability and keepping system stable

Overview

Continuous delivery is not a thing — it’s a process. Getting to where you’re doing continuous delivery is itself a process. That’s because it requires changes to tooling, to processes, and most important, to how people work together, and who works together(pub17b,11).

Release software fast with quality.

Dev
- Plan
- Code
- build
- test
Ops
- release
- deploy
- operate
- monitor
  - postmortem
Loop back to start

Purpose of DevOps

focus on getting that feedback loop as short as possible so we can actually detect correlations, and discern cause and effect(pub17b,9)

DevOps concepts

Planning tools
- jira
- wekan
Code repository
- Version controled with git
  - How to Run Your Own Git Server
  - gogs
  - gitlab local
  - bitbucket
  - github
Infrastructure
- On-prem
- cloud providers
  - Azure
  - aws
  - google cloud
- deployment automation
  - foreman
  - terraform
  - ansible
Networking and security
- Know this to the extent of being able to prepare the servers to run the application
  - but not to completely take over managing these servers.
- Firewall, proxy servers
- load balancers
- http/https
- IP, DNS Name resolution
Containers
- Virtualization
- Containers
  - Docker
Telemetry
- logs
  - fluent-bit
- metrics
  - prometheus
- trace
  - open telemetry
Build automation CI/CD
- Build tools and package manager
  - run test
  - package app
- Artifact repository
  - dockerhub
  - How to Choose a Container Registry: The Top 9 Picks
    - The Community Distribution of Quay
    - GitLab CR
    - Sonatype Nexus Repository
- Build automation
  - Jenkins
Container orchestration
- k8s
  - grafana
  - loki
  - Prometheus
  - traefik proxy
  - cert manager
Monitoring
- monitor software
- monitor infrastructure
- prometheus
- (nagios)
Infrastructure as code
- infrastructure provisioning
  - terraform
- configuration management
  - ansible
  - chef
  - puppet
Scripting language
- Bash
- Powershell
- python / Ruby / GOlang

security steps

SCA SAST
DAST
Security in IaC - Infrastructure as Code?

The pillars of DevOps

configuration management lets you make sure the development, testing and production environments are closely matched, so any errors that new code could cause in production are discovered — and corrected — long before deployment(pub17b,7).
Continuous Integration: Deployment becomes much less stressful when changes are small and tested at every step. And if you need to, it’s easier to roll back changes to your code, changes to the environment, or more importantly, both together(pub17b,8).
Notes
- If there are errors, each deployment should be a small enough change that it’s easy to roll back to the last known good state(pub17b,11).

Learning path

DevOps boot camp
Code repository
- Version control with git
Build and package management
- Languages
  - Rust
  - Javascript
  - Python
  - Java
- package app
- run test
Containers
- Docker
Infrastructure
- On-prem
- cloud providers
  - Azure
  - aws
  - google cloud
Container orchestration
- minikube
- k8s deployment
- k8s on KVM
- AWS - EKS
- Azure k8s
- OpenShift
Artifact repository
- Nexus
- dockerhub
- gitlab?
Build automation CI/CD
- Jenkins
- github
- gitlab?
Infrastructure as code
- infrastructure provisioning
  - terraform
- configuration management
  - ansible
- Helm
- flux/aegro?
Telemetry
- logs
  - fluent-bit
- metrics
  - prometheus
- Trace
  - jaeger
Monitoring
- monitor software
- monitor infrastructure
- prometheus
- (nagios)
Networking and security
- Know this to the extent of being able to prepare the servers to run the application
  - but not to completely take over managing these servers.
- Firewall, proxy servers
- load balancers
- http/https
- IP, DNS Name resolution
HA
- ChaosMonkey
Scripting language
- Bash
- Powershell
- python / Ruby / GOlang

learning Telemetry

telemetry - fluentbit - logs

TODO
- how does filters work?
- how to turn json keys into entries in elastic-search
- Validating your Data and Structure - I think introduction.

Introducing DevOps into a new team

1 - The first real step is to follow your build process and write it all down(pub17b,12).
2 - Identify what to automate first:
- Which steps take the most time?
- Which steps are the most error-prone and/or require the most human intervention?

Introducing Platform engineering

Start with providing something at least one team needs right now(e.g. k8s cluster)
Identify common tools used How to implement IDP succesfully
- Look at what tools each team is using, these tools could then be the first tools offered as a service
- You need to work closely with the app teams
- Teams are happy to work with you if they see you are solving an issue or removing a bottleneck.
- Don't star by e.g. forcing them to move to a new CI/CD tool, that you want to standardize. You are then adding to their workload.

[I do wonder, is this following the DevOps summary papers sugestions on rolling out devops?]

v1.0 - taking load off developers
- Prove that you make their work easier.
v2.0 - work on consistency
Teams Use pre-configured services via the IDP
Best practices are then backed into the services

State of Devops reports summaries

Notes from 2018 - State of devops

Stage-0: Foundation
- Deployment patterns for building applications and services are reused
- Monitoring and alerting are configurable by the team operating the service.
  - Empowered teams that run applications and services in production can define what a good service is; how to determine whether it’s operating properly; and how they’ll find out when it’s not.
- Deployment patterns for building applications or services are reused.
- Testing patterns for building applications or services are reused.
- Teams contribute improvements to tooling provided by other teams.
- Configurations are managed by a configuration management tool.
- Rearchitect applications based on business needs.
Stage-1: Common tech stacks -> Reduce complexity
- Teams deploy on a standard set of operating systems.
- Build on a standard set of technology.
  - standardizing with an eye to what is optimal for all applications, not just a few applications. Use proven technologies and reliable processes for what goes into production, and provide clear processes and guidelines for adding any new technology to enable product incubations, research and experimentation.
  - While standardizing the tech stack provides clear business benefits, rigidly adhering to standards can put a damper on learning and innovation. The key is to regularly revisit standards and build in exceptions for innovation and experimentation.
- Put application configurations in version control.
  - Separating data from code is low-hanging fruit, and makes sense in these early stages. It also builds the foundation for automated deployment. With app configurations in version control, you can track who makes what changes, and roll back changes as needed.
    - for example, etcd, ZooKeeper, and Consul.
- Test infrastructure changes before deploying to production.
  - also provides the foundation for creating reusable deployment patterns, which you can’t do unless you have a standard way of testing changes.
Stage-2: Standardize and reduce variability
- Build on a standard set of technology.
  - The variation could be caused by
    - Adoption of new technologies to replace many functions of older technologies; yet the older technologies never actually get removed.
    - Homegrown products that don’t follow any common industry standards and lack common interfaces.
    - A proliferation of tools that overlap and haven’t been rationalized.
  - A primary anti-pattern to watch for at this stage is each team normalizing on its own standards. This will lead to a greater degree of global variance, and is exactly the wrong direction. p49
  - %T% Work towards all teams use the same tools or concepts to deploy the apps.
  - standardize on proven technologies, optimizing for the 80 percent cases and your global use cases. This can be done only in collaboration with other teams. p49
  - The main benefit in this stage is reducing variables and therefore complexity, buying time for further investments in collaboration, automation, sharing, and metrics in subsequent stages.
  - The number of variables in any process or system is directly proportional to its complexity. With fewer variables in play, it is easier to execute a process. And with fewer variables, you can also isolate them, modify them and measure the impact of each change. Next you reduce the variables to optimize flow. Then you make changes in those variables to further optimize output.
  - Start by choosing foundational elements to normalize on — for example, you could select a single relational database management system and a single key value store. p50
    - You can also reduce variables by normalizing your testing workflows, build, and shipping patterns. p50
  - Ideally, teams driving better understanding of their problem domain are innovating, and with technology where warranted.
  - There should be a lower barrier to trying something, but the barrier should rise significantly when it comes to introducing a new piece of technology into a production lifecycle. p50
  - The key benefits of standardizing a team’s patterns and technologies are: p50
    - Faster delivery velocity.
    - More flexibility for development staff to work on different applications, services or components.
    - Reduced surface area for security vulnerabilities.
    - Fewer moving parts to maintain, upgrade and learn.
  - Organizations can move faster when a single operating system, or a small set of operating systems, is the standard. You save time on patching, tuning, upgrading and troubleshooting when there’s just one OS or at least a very small number in use. p51
  - Beyond operating system standardization is the rest of the technology stack.
    - The owners and choosers of the technologies in play here can vary.
    - Standardizing across many teams on technology choices like database systems, message queues, logging aggregation utilities, monitoring/metrics instrumentation and collection, and key value stores allows for any lessons learned in supporting and maintaining those tools to be reapplied to other applications and teams.
      - Interview with Dan McKinley - Velocity New York 2015
      - Dan McKinley, Data-Driven Done Right, LSC14
- Put system configurations in version control.
  - Keeping system configurations in version control is also one of the first steps to adopting software development practices for infrastructure. This in turn is key to automated infrastructure delivery, and a building block toward infrastructure as code. p53
- Teams deploy on a single standard operating system.
- Deployment patterns for building applications and services are reused.
  - stronger case for unified deployment process flow, tools and patterns. Failures can be investigated and managed uniformly across different services, so the teams responsible for deployment are less likely to have to go back to service authors when a deployment fails. p57
  - In organizations where deployment patterns are truly mastered, multiple applications use the same pipelines and jobs for deployment; only the application name and possibly a few other parameters are fed to the job as configuration. p57
  - With deployments standardized and reused to this degree, any optimization to the deployment job or pipeline is immediately consumed by all applications, so the benefits multiply quickly. p57
  - When each team invents its own deployment patterns, that limits agility, and the team doesn’t have time to spend on truly differentiating work.
  - This also makes it harder for developers and infrastructure engineers to move between teams, which further limits agility (and, by the way, makes it harder for your people to grow and develop at your organization, threatening retention).
- The primary goal of architecture changes is to support standardization and align with its goals — greater velocity and easier maintainability. p53
Stage 3: Expand DevOps practices
- Infrastructure changes are tested before deploying to production.
  - while some lend themselves to automation with a reasonable amount of effort, other changes are just too infrequent or expensive to validate in an automated fashion. p58
  - So don’t get too locked into the method — just make sure that you validate infrastructure changes prior to a production deployment.
  - For example, when replacing core network switches in a data center, the engineers should be sure they understand the new switch, have tested its capabilities, have a deployment plan, and know they must validate functionality. p58
- Individuals can do work without manual approval outside the team
  - Empowering teams and individuals certainly supports the spirit of a DevOps evolution, in addition to getting work done more quickly. p55
  - When someone can get work done with minimal handoffs, approvals and wait time, they’re happier and more productive. p55
  - Mark it harder to fail.
- Individuals can make changes without significant wait times.
  - It’s helpful to look at the reasons for each wait and ask what would have to change in order to eliminate it. p59
  - When processes are simpler and consistent, they’re also easier to automate, which comes in handy as organizations progress toward self-service.p59
- Service changes can be made during business hours.
  - Some organizations do maintenance only during business hours, making use of canary deployments, blue/green deployments or active/passive sides of an application. p60
    - These architecture and deployment patterns optimize for rolling change through the system often, and allow for a relatively easy backout plan if a change goes awry. p60
  - you need to demonstrate success in making changes reliably so the business partners and stakeholders of your service trust your abilities. p60
- Post-incident reviews occur and results are shared.
  - Post-incident reviews are a blameless look back at what happened during an incident, how it happened, and what improvements could be made to shorten the duration of the incident, improve the understanding of the systems behind the incident, and prevent it from happening again.
    - Post-Incident review pdf
  - Improvements from a well-run post-incident review can include revisiting and simplifying processes; updating communication patterns; and working from a position of empathy with other stakeholders of the application or service.
  - Once a post-incident review is done, share the results. People who were not directly involved may be able to learn something. They may spot a flaw in an adjacent process. p61
  - Some organizations share results with their customers publicly, while others make them available to internal customers and stakeholders. The more you share, the more collaboration and trust you’ll foster. p61
- Teams build on a standard set of technologies.
  - Some organizations begin by standardizing on entry points for deployment — for example, to deploy any application, you type ./deploy <environment>. p56
  - standardizing on technologies is an ongoing effort, not a single moment in time. p61
- Teams use continuous integration.
  - The important things to optimize for are feedback cycle time and correctness. p61
    - Correctness also matters, so CI systems require maintenance, adjustments and improvement over time. p61
    - For example, if you add a new operating system or browser to your support matrix, all relevant jobs should be able to pick it up.
  - When feedback cycle times are short, more iterations can occur, and so quality improves. p61
    - it may make sense to run only fast tests during working hours, and wait to run slower tests at night or during a weekly window when feedback cycle time is not as critical. p61
- Infrastructure teams use version control.
  - The use of version control by infrastructure teams has a significant impact on Stage 3 of DevOps evolution, and is also an associated practice for Stage 4. p61
  - Use of version control makes it easy to recreate environments for testing and troubleshooting, boosting throughput for both Dev and Ops. p66
    - It also reduces the time to recover if an error is identified in production. p66
Stage 4: Automate infrastructure delivery - the objective driving infrastructure automation at this stage is to provide greater agility to the entire business, not just for a single team.
- About this stage:
  - It often begins with teams automating for their own needs, and then begins to align with the business.
  - infrastructure automation develops to provide uniform capabilities and services for technology delivery.
  - The goal is to provide more reliable services and capabilities through a formal automation pipeline and workflow that couple with the services and applications built on that infrastructure. p63
- System configurations are automated. p64
  - You need control over your infrastructure layer in order to achieve agility with the applications and services running on top of it. p64
  - Once you can repeatably deal with account creation/removal, load balancer configuration changes, security patches and monitoring policy updates, you’re no longer being held back by infrastructure that lags behind changing business and application demands. p64
  - Configurations for systems are normally built or rendered from a source of truth (version control) using an automation framework. p64
    - either: automating all change, giving them completely repeatable, rebuildable systems
    - or: automate the most common tasks – where the return on investment is easy for other teams and management to see; leaving the complicated or infrequent changes to be dealt with in a more ad-hoc manner.
  - Benefits
    - Overall speed: Automated tasks should be faster than manually completed tasks.
    - Consistency: Automated tasks follow a set process and thus produce predictable results.
    - Documented behavior: Tasks now have a defined way they are supposed to work, so are easier to troubleshoot.
    - Portability: With the right automation framework, teams can use content written by others to improve velocity and maintenance of their automation library.
  - When you begin automating infrastructure, automate items you run into with the highest frequency across the widest swath of infrastructure components.
    - This will have a big impact, free up your own time in meaningful ways, and buy you time to work on more complex automations.
- Provisioning is automated.
  - Instead of treating each service request as a one-off, operations teams develop and offer a menu of standardized services aligned with business objectives. p65
  - Provisioning can be the automatic creation of a resource of nearly any type. p65
    - Most often, teams use the word when they’re talking about OS instances, network connectivity, storage, and accounts.
  - As with system configurations, it’s best to begin with the most frequently requested item; gain some wins, consistency and time savings; and then move onto the next most frequent request
  - As with most steps in the DevOps evolution, you want to choose tasks that will win the confidence — even gratitude — of others both inside and outside your team. p65
  - Application configurations are in version control
    - Application configurations should be versioned, auditable, contain history, and ideally, the reasons why they’ve been changed. p66
- Security policy configurations are automated.
  - The best way to adhere to security policy is to know whether you’re compliant, and fix systems when you’re non-compliant. p67
  - As security policy automation gets a bit more mature, the use of configuration management systems emerges. Configuration management enables policy to be enforced upon system convergence, and reports to be handled in a standard way. p67
  - Some teams may run static analysis on code via their continuous integration pipelines. p67
- Resources made available via self-service. pipelines. p73
Stage 5: Provide self-service capabilities
- About
  - Application architecture moves beyond standardizing on technologies and begins to evolve towards working with and supporting cloud migration, container adoption, and proliferating microservices. p69
  - Security policy automation moves from servicing the needs of a team to becoming the baseline for how security and compliance are measured throughout a department, or even the entire organization. p69
  - Additionally, automated provisioning advances to provisioning of whole environments for developers, testers and other technical staff. p69
- Incident responses are automated.
  - All of this means there’s a huge amount of value to be gained by automating incident response. p70
  - Automating eliminates unnecessary distractions, improves time to remediation by reducing handoffs, and ensures that your remediation processes are consistently applied.
  - think about your automation as being there to augment human judgement. p70
  - Focus on the processes and systems that let you identify issues, as well as those you deploy when responding. p70
  - Make it simple for your operators to get to whatever data they need to form a judgement,
    - and once they’ve done so, automate response processes — things like
      - adding a malicious IP to all your firewalls across your infrastructure;
      - collating data for later forensics;
      - or completely isolating an infected machine.
- Application developers deploy testing environments on their own.
  - Teams should build self-service systems for themselves and then their adjacent teams, next expanding outwards through the organization. p71
    - This is exactly what the data shows successful teams do.
- Success metrics for projects are visible.
- Provisioning is automated.
- Stage 5, rearchitecting applications for the business means more fundamental surgery is performed: adopting the 12-factor app methodology, moving to microservices, adopting containers or replacing components with cloud services. p72
- Security teams are involved in technology design and deployment
  - Testing security requirements as a part of the automated testing process. p72
  - Creating pre-approved, easy-to-consume libraries, packages, toolchains and processes for developers and IT operations to use in their work. p72
- Contributors to succes in Stage5
  - if you don’t put the work in to customize catalog items for your business, you won’t see the truly significant gains that are possible. p73
  - Whether teams are building something themselves or deploying an off-the-shelf self-service catalog, it’s critical that the platform can truly operate as an underlying substrata for other solutions such as CI/CD
- Success metrics for projects are visible p75
  - (number of builds done, number of failed, how long was the pipeline down after a break)
  - Once success metrics are clearly defined and visible to everyone, you'll find it's far easier to get agreement on what needs to be addressed next for the health of the business. p75
One or more teams automate a few key things; they reclaim time that used to be spent putting out fires; and they invest that time in further improvements, which helps to build momentum and support for change within their team.
automated measurement of business objectives.
focus on delivering business value rather than just technology.
Our discovery that all the fundamental practices enable or rely on sharing tells us that the key to scaling DevOps success is adoption of practices that promote sharing.
- It makes sense: When people see something that’s going well, they want to replicate that success, and of course people want to share their successes
collaboration and sharing across team boundaries. That sharing is critical to defining the problems an organization faces and coming up with solutions that work for all teams.(o44 of devops 2018)
IT capabilities as a service to the business, rather than treating IT as a cost center that executes work orders. p68

2017 - State of devops

Planning

Planning tools

11 open-source kanban tool 2021
kanboard-cli
python3-kanboard

Building

Automated build environment
- gitlab
- github
- jenkins
Build/Deploy to real environment
- okteto
- tilt
- skaffold

Testing

junit
code coverage cobertura
- Cobertura sample XML output
TODO is there a generic tool for running tests?
Top Test Automation Frameworks in 2023

junit format

<?xml version="1.0" ?>
<testsuites>
    <testsuite errors="0" failures="0" name="first test suite" tests="1">
        <testcase classname="some.class.name" name="Test1" time="123.456000">
            <system-out>
                standard out of the test run
            </system-out>
            <system-err>
                standard error of the test run
            </system-err>
        </testcase>
    </testsuite>
</testsuites>

Cucumber testing framework

BDD Framework Cucumber

Telemetry

Mastering distributed tracing
Netflix Atlas Telemetry: A Platform Begets an Ecosystem
Netflix atlas wiki
Netflix atlas getting started
Jaeger
Jaeger: Front end for OpenTracing?
Kiali: (Jaeger and OpenTracing)
OpenTracing: Vendor-neutral APIs and instrumentation for distributed tracing

GitOps

What is GitOps, How GitOps works and Why it's so useful
Explaining GitOps: How does it work?
GitOps is X as Code + push or pull the changes.
- where 'X as code' can be
- Infrastructure as code
- Network as code
- Policy as code
- Configuration as code
- Security as code

Push	Pull

Push
- Pros
  - Easy to setup/understand
  - Works with most tools/infra setups
- Cons
  - Not as secure as Pull based model
    - ports have to be opened
    - CI tool has to be given access to the production environment
      - (TODO how is this different to the Pull model getting access???)
Pull
- Pros
  - Fast/effecient
    - (TODO how is this faster than push? Push starts emediatly where pull runs on an interval)
  - Tighter integration
    - (TODO how?)
  - More secure than the Push model
    - (TODO How?)
    - Dont need to open the firewall
      - (TODO what about the network path to do the pull)
    - Password is not needed externally (TODO can you 'k exec' the argo cd container?)
- Cons
  - Limited to k8s

ArgoCD

Continuis delivery tool.
Compares the actual state with the (desired) git state and applies the git state ArgoCD Tutorial for Beginners | GitOps CD for Kubernetes
- Argo CD can also send an alert instead of makeing the change.
Uses existing k8s functionality
- using etcd to store data
- using k8s controllers for monitoring and comparing actual and desired state.
Defines an 'Application' CRD that bind a git repo to a k8s server and namespace
- The cluster can both be the cluster ArgoCD runs in or an external cluster.
- You can define multiple applications
- You can group multiple applications in another CRD called AppProject
I a multicluster comprising Dev, Stagem Prod, you could have an Argo CD running in each cluster.
I a multicluster where each cluster runs in a different datacenter you might have a single ArgoCD orchestrating in all the clusters.
How do we test the configuration update
- 1 - git branches for each environment ArgoCD Tutorial for Beginners | GitOps CD for Kubernetes
  - probably not the best option(TODO why not?)
- 2 - Using overlays with kustomize
Principles of GitOps
- Declarative configurations
- Store desired state in git
- Apply approved changes automatically
- Check and correct wit a software agent.

Installing Argo CD

Getting started

Deploying applications to argo CD

Make sure thar you have pushed your changes to your git repo.
Then the First time do kubectl apply -f application.yaml
after this it is enough to push new changes to the git repo.

Argo CD application manual update

click refresh
- compare the latest code in the git repo with the live state
click sync
- Move to target state, by actually applying the changes to the k8s cluster

Troubleshooting Argo CD

Images version change is not picked up

and the service was not shown in the Argo CD webui.

it turns out I had copied and pasted the deployment into both the deployment.yaml file and the service.yaml file.

It also showed a yellow waning about the app resource but I didn't understand what it meant.

FluxCD

Fleet - the repo for the gitops
Flux installation
curl -s https://fluxcd.io/install.sh | sudo bash

git clone [email protected]:henko2/fluxtest.git 2021 mkdir -p clusters/my-cluster/flux-system 2022 mv clusters/my-cluster clusters/flux-configuration 2023 cd clusters/flux-configuration/flux-system/ 2024 touch gotk-component.yaml gotk-sync.yaml kustomization.yaml

You might need to export KUBE_CONFIG/$HOME/.kube/XXXX
export GITLAB_TOKEN=$(head -1 $HOME/.gitlab/fluxcd_api)
k get nodes
flux bootstrap gitlab --owner=REPO_OWNER --repository=fluxtest --branch=main --path=flux-configuration/flux-system --token-auth --personal
k get gitrepo -n flux-system
- This shows where the source for fluxcd is stored.
- You can use a different repo for the source to get deployed
get secret flux-system -n flux-system -o jsonpath='{.data.password}' | base64 -d
- Show the access token to gitlab
k get clusterrolebindings | grep flux
- cluster-reconciler-flux-system ClusterRole/cluster-admin 37m
- crd-controller-flux-system ClusterRole/crd-controller-flux-system 37m

► connecting to https://gitlab.com
✔ repository "https://gitlab.com/REPO_OWNER/fluxtest" created
► cloning branch "main" from Git repository "https://gitlab.com/REPO_OWNER/fluxtest.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed component manifests to "main" ("xxx")
► pushing component manifests to "https://gitlab.com/REPO_OWNER/fluxtest.git"
► installing components in "flux-system" namespace
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
► applying source secret "flux-system/flux-system"
✔ reconciled source secret
► generating sync manifests
✔ generated sync manifests
✔ committed sync manifests to "main" ("xxx")
► pushing sync manifests to "https://gitlab.com/REPO_OWNER/fluxtest.git"
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for GitRepository "flux-system/flux-system" to be reconciled
✔ GitRepository reconciled successfully
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready
✔ all components are healthy

Definition of a new git repo that fluxcd needs to monitor
- This file is stored in the fluxcd repo under: the flux-configuration directory

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: podinfo
  namespace: flux-system
spec:
  interval: 30s
  ref:
    branch: master
  url: https://github.com/REPO_OWNER/SW_REPO_NAME

interval - how often to check for file changes.

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: podinfo
  namespace: flux-system
spec:
  interval: 5m0s
  path: ./kustomize
  prune: true
  sourceRef:
    kind: GitRepository
    name: podinfo
  targetNamespace: default

interval - how often to match the environment with the definition files.
prune: true - delete pods that are no longer references in the deployment files.
sourceRef - where to get the deployment files.

flux get kustomization --watch

Internal Developer Platform(IDP)

  graph TD;
      User---GUI;
      User---GIT;
      GUI---A[Interface\nControl plane];
      GIT---A;
      Secrets---A;
      Schemas---A;
      Pipelines---A;
      Pipelines---Images;
      A---elsewhere;
      A---AWS;
      A---Azure;
      A---Google;
      A---DB1[DB];
      A---DB2[DB];
      GUI---GIT;

Hide what the users do not care about
Show what the users care about
TODO kubectl explain CRD_NAME --recursive
do not interact with the k8s CP directly but put it in git and then push it from there
then synchronize from git to the k8s cluster with gitops
Control plane
- Crossplane
User-friendly interface
- Backstage
Synchronization from Git with GitOps
- ArgoCD
- Flux
Schema management
- https://schemahero.io
Secrets management
- ?hyperscaler?
- External Secrets Operator
Signing artifacts
- Signing and verifying artifacts
Graphical User Interface(GUI)
- port
- catalog of services
- 10 Best Internal Developer Portals to Consider in 2023
- What is an Internal Developer Platform (IDP)?
- Top 10 Internal Developer Platforms in 2024
CI/CD pipelines
- github.com/features/actions
- Tekton
- Jenkins
- CircleCI
- AzureDevOps?
Infrastructure orchestration
- DevOps MUST Build Internal Developer Platform (IDP)
- Crossplane
  - helps create the CRDs?
- ACK
- IaC with kubernetes operators
Application orchestration
- OAM/KubeVela
- Crossplane
RBAC
- Git for the desired state
- so manage people access to git.
- and read only to see the state of the deployment

Control plane

Crossplane

sync from git with gitops

ArgoCD
or flux

IDP tools

Argo CD

Let's do GitOps in Kubernetes! ArgoCD Tutorial

Crossplane

Install the crossplane cli

Crossplane Introduction: Animated Guide, Comparison with Terraform & AWS S3 Demo
curl -sL https://raw.githubusercontent.com/crossplane/crossplane/master/install.sh | sh
sudo mv crossplane /usr/local/bin
crossplane --help

Install crossplane in your k8s cluster

Install Crossplane
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm install crossplane --namespace crossplane-system --create-namespace crossplane-stable/crossplane
kubectl get pods -n crossplane-system
kubectl api-resources | grep crossplane

Give crossplane access to AWS

create a user in AWS and create credentials for that user
get the key and secret
save the key and secret in an aws-credentials.txt file
kubectl create secret generic aws-secret -n crossplane-system --from-file=creds=$HOME/tmp/aws-credentials.txt
kubectl describe secret aws-secret
create the provider.yaml file
kubectl apply -f provider.yaml
vi s3bucket.yaml
kubectl apply -f s3bucket.yaml
watch kubectl get bucket
kubectl describe bucket
look around
kubectl delete -f s3bucket.yaml

aws-credentials.txt

[default]
aws_access_key_id = xxx
aws_secret_access_key = xxxx

provider.yaml (There ProviderConfig is dependent on Provider, so you might need to split the two into seperate files and call Provide creation first.)

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws-s3
spec:
  package: xpkg.upbound.io/upbound/provider-aws-s3:v0.42.0
---
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-secret
      key: creds

Backstage

A First Look at Backstage.io
Backstage guid articles
How To Build A UI For An Internal Developer Platform (IDP) With Port?
Port & ArgoCD: Building a Unified Developer Experience - Part 3/3
Building developer portals with Backstage
Developer Portals: Building Backstage templates and plugins
run the javascript docker image
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
export NVM_DIR="$([ -z "${XDG_CONFIG_HOME-}" ] && printf %s "${HOME}/.nvm" || printf %s "${XDG_CONFIG_HOME}/nvm")"
[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
curl -o- -L https://yarnpkg.com/install.sh | bash -s -- --version 1.22.22
- to install the version that fits with Node.js version 18
export PATH="$HOME/.yarn/bin:$HOME/.config/yarn/global/node_modules/.bin:$PATH"
nvm install 18
nvm use 18
npx @backstage/create-app@latest
- y
- my-backstage-app
cd my-backstage-app
vi app-config.yaml
- change baseUrl: http://localhost:3000
- to baseUrl: http://0.0.0.0:3000
Follow the authentication process:
yarn upgrade
plugin-catalog-backend-module-github-org
- also try what is on
Update app-config.yaml
Update packages/app/src/App.tsx
Update packages/backend/src/index.ts
yarn add --cwd packages/backend @backstage/integration
- has some gyp errors.
yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github
yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github-org
yarn install
yarn dev
Follow the github thing
Update app-config.yaml
Update packages/app/src/App.tsx
Update packages/backend/src/index.ts
Update examples/org.yaml to change "guest" to my GH username.

TODO what tool can I use for log agregation?

gitlab as auth for backstage

start the gitlab server
click the 'Admin Area' button at the bottom of the left bar
Click 'Applications'
Click 'Add new application'
follow description in GitLab Authentication Provider

Failed to sign-in, unable to resolve user identity

Failed to sign-in, unable to resolve user identity

Login failed, user profile does not contain an email

Login failed, user profile does not contain an email

Origin `http://172.17.0.2:3000` is not allowed

Auth fails with "Login failed; caused by NotAllowedError: Origin '...' is not allowed"

change the app.baseUrl in app-config.yaml to the IP address the browser is using to access the website (TODO how is this handled when it is running through e.g. nginx???)

app:
  title: Scaffolded Backstage App
  baseUrl: http://172.17.0.2:3000

Sign in using GitHub xxx

Origin 'http://172.17.0.2:3000' is not allowed

[1] 2024-06-01T21:35:30.713Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:30 +0000] "GET /api/auth/github/start?scope=read%3Auser&origin=http%3A%2F%2F172.17.0.2%3A3000&flow=popup&env=development HTTP/1.1" 302 0 "http://172.17.0.2:3000/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest
[1] 2024-06-01T21:35:31.075Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:31 +0000] "GET /api/auth/github/handler/frame?code=6854f0903f3fda0b52ab&state=6e6f6e63653d7a434e35716a326b724f4261646d25324253475a6437717725334425334426656e763d646576656c6f706d656e74266f726967696e3d687474702533412532462532463137322e31372e302e322533413330303026666c6f773d706f7075702673636f70653d7265616425334175736572 HTTP/1.1" 200 - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest
[1] 2024-06-01T21:35:31.100Z rootHttpRouter info ::ffff:172.17.0.1 - - [01/Jun/2024:21:35:31 +0000] "GET /favicon.ico HTTP/1.1" - - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:126.0) Gecko/20100101 Firefox/126.0" type=incomingRequest

Error: Cannot find module '@backstage/plugin-catalog-backend-module-github-org'

yarn why @backstage/plugin-catalog-backend-module-github-org
- response: error We couldn't find a match!
yarn add --cwd packages/backend @backstage/integration
- has some gyp errors.
yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github
yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github-org

[0] <i> [webpack-dev-server] 404s will fallback to '/index.html'
[1] node:internal/modules/cjs/loader:1140
[1]   const err = new Error(message);
[1]               ^
[1] 
[1] Error: Cannot find module '@backstage/plugin-catalog-backend-module-github-org'
[1] Require stack:
[1] - /home/devenv/my-backstage-app/packages/backend/src/index.ts
[1]     at Module._resolveFilename (node:internal/modules/cjs/loader:1140:15)
[1]     at Module._load (node:internal/modules/cjs/loader:981:27)
[1]     at Module.require (node:internal/modules/cjs/loader:1231:19)
[1]     at require (node:internal/modules/helpers:177:18)
[1]     at <anonymous> (/home/devenv/my-backstage-app/packages/backend/src/index.ts:25:13)
[1]     at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
[1]   code: 'MODULE_NOT_FOUND',
[1]   requireStack: [ '/home/devenv/my-backstage-app/packages/backend/src/index.ts' ]
[1] }
[1] 
[1] Node.js v18.20.3

Unknown auth provider github

{
  "error": {
    "name": "NotFoundError",
    "message": "Unknown auth provider github",
    "stack": "NotFoundError: Unknown auth provider github\n
        at <anonymous> (/home/devenv/my-backstage-app/node_modules/@backstage/plugin-auth-backend/src/service/router.ts:164:11)\n
        at handleReturn (/home/devenv/my-backstage-app/node_modules/express-promise-router/lib/express-promise-router.js:24:23)\n
        at /home/devenv/my-backstage-app/node_modules/express-promise-router/lib/express-promise-router.js:64:7\n
        at Layer.handle [as handle_request] (/home/devenv/my-backstage-app/node_modules/express/lib/router/layer.js:95:5)\n
        at trim_prefix (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:328:13)\n
        at /home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:286:9\n
        at param (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:365:14)\n
        at param (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:376:14)\n
        at Function.process_params (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:421:3)\n
        at next (/home/devenv/my-backstage-app/node_modules/express/lib/router/index.js:280:10)"
  },
  "request": {
    "method": "GET",
    "url": "/api/auth/github/start?scope=read%3Auser&origin=http%3A%2F%2F172.17.0.2%3A3000&flow=popup&env=development"
  },
  "response": {
    "statusCode": 404
  }
}

alpine seems to not work

apk add py3-setuptools
apk add build-base
apk add musl-dev
dev86
apk add linux-headers
apk add --force-refresh coreutils

Scratchpad

Splitting Source code and App configuration into two separate repos

ArgoCD Tutorial for Beginners | GitOps CD for Kubernetes
- So you don't have to run all the app test pipeline for just the config update.
  - (TODO but you would still have to test the app with the updated config)
TODO How is the config tested before Argo CD pulls it into prod???

Debugging clusters

Prompt: Help Me Debug a Cluster! - Anusha Ragunathan & Lili Wan, Intuit Inc
cluster golden signals (7.14), derived from "7 golden signals" ?
- Four pilars of
  - errors
  - saturation
  - latency
  - traffic
- Cluster golden signals is a collection of algorithms, quality metrics and dashboards that provides a single pane of glass to view the health and availability of a k8s cluster, while also providing a single "golden signal" to get alerted on. 7.42
- identify the core critical components of a k8s cluster and bucket them by functionality
  - control plane
  - authenticaction
  - autoscaling
  - network
  - critical cluster addons
  - bootstrap addon metrics
  - AWS metrics(cloud metrics)
- for each component: Generate a prometheus golden Signal.8.26
  - Health is generated using an algorithm that relies on:
    - error SLAs
    - statistical formulas for anomaly detection
  - Health can have one of 3 values: Healthy, Degraded or Critical
- Generate overall health and availability of the cluster using the component golden signals and aggregation algorithm 8.47
  - A cluster can only be healthy if all components are healthy
  - if just one component is degraded then the whole cluster is degraded
  - if just one component is critical then the whole cluster is critical
- Build dashboards to surface the metrics
- setup alerting based on the health of the cluster
- (A closer look at errors and how to describe them as a prometheus rule) 10.36

Reduce MTTR

Key: Identify root cause.

AI for platform debugging
- k8sgpt