Logbook - Only-Smiles/DevOps-2025 GitHub Wiki
Week 11 - 11.04.2025
Security
A. Risk identification
Identify assets (e.g. web application)
- The codebase
- The database
- Maybe Grafana and Prometheus
- User data (Password, email and username)
Identify threat sources (e.g. SQL injection)
- SQL Injection
- Exposing ports
- Unsafe passwords/usernames for DB, Grafana or Prometheus
- Exposing secrets on Github, such as uploading a
.env
file - No MFA or 2FA implemented in the system
- Exposing API authorisation hash (anyone can post messages under any username with a command)
- Using old, unsecure libraries
Construct risk scenarios (e.g. Attacker performs SQL injection on web application to download sensitive user data)
1a. Attacker performs SQL injection on web application to download user data 1b. Attacker performs SQL injection on web application to delete user data 2. An attacker gets access to our environmental secrets and uses them to get access to our VM, DB, dashboards or Discord 3. An attacker can exploit a vulnerability in a library that we use to do malicious things to our application 4. An attacker can leak passwords leading to unrecoverable business reputation 5. An attacker could encrypt our DB leaving us unable to retrieve the DB 6. An attacker is able to guess or brute-force weak passwords, they could gain unauthorised access to usersโ accounts.
B. Risk Analysis
Determine likelihood and impact
We're using the following levels of impact / probability
Impact: Insignificant, Negligible, Marginal, Critical, Catastrophic Probability: Certain, Likely, Possible, Unlikely, Rare
Use a Risk Matrix to prioritise risk of scenarios
Scenario | Impact | Probability | Comment |
---|---|---|---|
1a | Critical | Unlikely | We are using a gem called sequel, which uses prepared statements, which helps mitigate SQL injection |
1b | Catastrophic | Unlikely | Same as above. Catastrophic, because if we lost all our data, we are done for production |
2 | Catastrophic | Rare | Even though we are humans, and humans make mistakes, we have added .env* to our .gitignore, so it would take a developer to force add it, for it to be exposed on Github |
3 | Depends, but from Negligible to Catastrophic | Certain | It really depends on what vulnerability is leaked |
4 | Critical | Unlikely | How would they get our DB? Same as 1a, 1b and 2, we think we're safe... right? |
5 | Catastrophic | Unlikely | For same reason as above. However, we had this happen to us, because we used the standard password for our DB in the beginning |
6 | Negligible | Likely | They would be able to make weird tweets and impersonate a user. It's something we can't prevent, unless we force people to use MFA or if we had some kind of timeout when attempting to log in with the wrong password 3 times |
Chris and Rakul also did a graph where we mapped the scenarios. Rakul has a picture of it.
Discuss what are you going to do about each of the scenarios
Only vulnerability 3 and 6 are applicable we think
Vulnerability 3:
- I've enabled Dependabot to keep dependencies secure and up-to-date
- I've enabled CodeQL analysis / Code Scanning to find security vulnerabilities and errors in our repo
- I've enabled Secret Protection and Push protection which blocks commits that contain supported secrets
Vulnerability 6:
- Would require us to setup 2FA or a timeout after 3 faulty tries. Isn't really worth it for the scope of this project?
Logging
One month later ...
I looked at these articles:
- Fluentd vs Logstash: A Comparison of Log Collectors
- Filebeat vs. Logstash: The Evolution of a Log Shipper
Don't know what is right
Week 10 - 04.04.2025
Switching DO account
Digital Ocean supports creation of snapshot droplets and sharing them across teams. That is basically what we found out that we could do.
How to Transfer a Droplet Snapshot to a Different Team
How to Create Snapshots of Droplets
The only thing we loose is of course some data, but also the IP-address is not copied over, but is reassigned to another random IP. This will hopefully be resolved when implementing HTTPS.
Week 8 - 21.03.2025
Prep material
Created this script with help from LLM
test_endpoints.sh
#!/bin/bash
# Define your endpoints
ENDPOINTS=(
"http://139.59.204.182:4567/public"
"http://139.59.204.182:4567/register"
"http://139.59.204.182:4567/login"
"http://139.59.204.182:4567/Marilu%20Mondloch"
)
# Loop through each endpoint and measure response time
for endpoint in "${ENDPOINTS[@]}"; do
echo "Testing: $endpoint"
curl -o /dev/null -s -w "
DNS lookup: %{time_namelookup}s
TCP connect: %{time_connect}s
TLS handshake: %{time_appconnect}s
Time to first byte (TTFB): %{time_starttransfer}s
Total time: %{time_total}s
\n" "$endpoint"
# curl -o /dev/null -s -w "Total time: %{time_total}s\n" "$endpoint"
echo "----------------------"
done
output:
Testing: http://139.59.204.182:4567/public
DNS lookup: 0.000975s
TCP connect: 0.043837s
TLS handshake: 0.000000s
Time to first byte (TTFB): 2.037110s
Total time: 2.046651s
----------------------
Testing: http://139.59.204.182:4567/register
DNS lookup: 0.001239s
TCP connect: 0.036644s
TLS handshake: 0.000000s
Time to first byte (TTFB): 0.095604s
Total time: 0.095791s
----------------------
Testing: http://139.59.204.182:4567/login
DNS lookup: 0.000766s
TCP connect: 0.023801s
TLS handshake: 0.000000s
Time to first byte (TTFB): 0.049605s
Total time: 0.054326s
----------------------
Testing: http://139.59.204.182:4567/Marilu%20Mondloch
DNS lookup: 0.000721s
TCP connect: 0.023651s
TLS handshake: 0.000000s
Time to first byte (TTFB): 0.153706s
Total time: 0.175663s
----------------------
Week 7 - ???
Week 6 - 07.03.2025
Migrating DB from mysql to Postgres
In order to figure out the best way of migrating our DB from mysql to another database, we did some research on the internet. During the research, we found a data loading tool, namely pgloader, which is an open-source project that allows for migrating from a database to Postgres.
Discussing branch strategy
We had a discussion about our branch strategy in order to see if we should change it up or not.
- Trunk based development
- Always merge development changes into main. Uses a flagging tool setup to show / hide features that you are working on. That allows you to continuously work on features directly in main, while still hiding an unfinished feature from the end-user. Whenever a feature is ready to be showed for the end-user, you can toggle it and test it on a group of users.
- Pros: Quick to push a release, as you just have to set a flag tool for that feature to true, in order to show the feature.
- Cons: Head overload for us to actually implement it in our project.
- Always merge development changes into main. Uses a flagging tool setup to show / hide features that you are working on. That allows you to continuously work on features directly in main, while still hiding an unfinished feature from the end-user. Whenever a feature is ready to be showed for the end-user, you can toggle it and test it on a group of users.
- GitFlow development
- Basically you have four types of branches; main, release-branches, dev, feature-branches. What is in main is what is deployed to the end-user. Release branches handle the features that are ready for a release. Dev is the development branch. This is where developers merge out from in order to work on feature flags. Features are worked on a feature branch. Once a feature is ready it gets merged into dev. When something is ready for a release, dev is merged into the release branch. This allows you to perform user-tests on this release branch, while still developing from Dev.
- Con: takes longer for features to go from development into main. Might be blocked by other releases.
- Pros: ensures UI testing of the application.
- Basically you have four types of branches; main, release-branches, dev, feature-branches. What is in main is what is deployed to the end-user. Release branches handle the features that are ready for a release. Dev is the development branch. This is where developers merge out from in order to work on feature flags. Features are worked on a feature branch. Once a feature is ready it gets merged into dev. When something is ready for a release, dev is merged into the release branch. This allows you to perform user-tests on this release branch, while still developing from Dev.
- Current setup: Simple GitFlow. GitFlow without a release branch. Hybrid trunk based.
- We currently have a mix of these two. We have a main branch and a dev branch. Everything in main is what is deployed and showed to the end-user. We don't use flags to toggle features, but we rely on our dev branch to be in a deployable state at all time. If we experience a bug in our deployed application, bugfixes should go out from main. This allows us to ensure that we fix bugs within 24hrs and deploy it.
- Pros: Won't be blocked by other releases. Conflicts are solved from feature and dev branches.
- Cons: Feature and dev branches might conflict.
- We currently have a mix of these two. We have a main branch and a dev branch. Everything in main is what is deployed and showed to the end-user. We don't use flags to toggle features, but we rely on our dev branch to be in a deployable state at all time. If we experience a bug in our deployed application, bugfixes should go out from main. This allows us to ensure that we fix bugs within 24hrs and deploy it.
Conclusion
Each strategy has pros and cons, but due to the overhead in implementing trunk based development, we decided to keep our current setup.
Preparation notes
The following questions were raised in the preparation material. This section provides a log on how to answer them.
CPU load during the last hour/the last day.
In order to inspect the CPU load we can use the commands uptime
or top
in our droplet. First SSH into the droplet or use the console from DigitalOcean.
$ uptime
$ 20:11:21 up 5 days, 7:58, 1 user, load average: 0.01, 0.02, 0.00
Explanation: The command was run at 20:11:21. The server has been up for 5 days 7 hrs and 58 min. The droplet has 1 user and the load average is for the last 1, 5 and 15 minutes before the command uptime was executed.
We can also run top
, which gives a more detailed overview and is interactive.
$ top
$ top - 20:17:17 up 5 days, 8:04, 1 user, load average: 0.15, 0.06, 0.01
Tasks: 100 total, 1 running, 99 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 0.3 sy, 0.0 ni, 99.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 1963.9 total, 197.2 free, 721.9 used, 1044.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1045.6 avail Mem
Explanation: First line is the same as in uptime
. The second line summarizes the states of all system tasks: the total number of processes, followed by how many of them are running, sleeping, stopped, or zombie. The third line tells you about the CPU utilization. These figures are normalized and displayed as percentages (without the % symbol) so that all the values on this line should add up to 100% regardless of the number of CPUs. The fourth and fifth lines tell you about memory and swap usage.
For an in-depth walkthrough, read the reference article.
Average response time of your application's front page.
I used Apache Benchmark to get statistics on the response time. It is installed by default on MacOS. I ran an Apache Benchmark with 1000 requests (heh).
$ ab -n 1000 -c 1 http://139.59.204.182:4567/public/
$ Benchmarking 139.59.204.182 (be patient)
...
Server Software:
Server Hostname: 139.59.204.182
Server Port: 4567
Document Path: /public/
Document Length: 73 bytes
Concurrency Level: 1
Time taken for tests: 40.200 seconds
Complete requests: 1000
Failed requests: 0
Non-2xx responses: 1000
Total transferred: 1094000 bytes
HTML transferred: 73000 bytes
Requests per second: 24.88 [#/sec] (mean)
Time per request: 40.200 [ms] (mean)
Time per request: 40.200 [ms] (mean, across all concurrent requests)
Transfer rate: 26.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 15 17 2.6 17 60
Processing: 19 23 3.6 22 84
Waiting: 19 22 3.4 22 84
Total: 35 40 5.2 39 105
Percentage of the requests served within a certain time (ms)
50% 39
66% 39
75% 40
80% 41
90% 43
95% 46
98% 56
99% 68
100% 105 (longest request)
From this output we can read off the Time per request
, which is 40.20 milliseconds.
Amount of users registered in your system.
Not sure how to answer this as I'm not sure how to connect to the DB :( Tried to run the following cmd from the host machine, but it lists 0. Where is our DB? ๐
$ sqlite3 minitwit.db "SELECT COUNT(*) FROM user;"
$ 0
Average amount of followers a user has.
Not sure how to answer this as I'm not sure how to connect to the DB :(
Week 5 - 28.02.2025
Finishing CI/CD setup
We finished working on the CI/CD setup, so that we every time a push is made to main
, the continous-deployment.yml
action is run. We had a lot of different issues with implementing this setup, and this section will try to explain what steps we did in order to make it all work. The PR 71 also has some additional notes that might come in handy.
Changing from private-owned repo to organisation
We experienced that a huge block'er for integrating CI/CD to our repository was, that our repository was owned by private users. This meant that the owner of the repo was the only person who could change the secret variables that were stored in the repo. We therefore opted for changing our repository to be an organisation, allowing all collaborators to go to the repository settings and change whatever needed to be changed. This removed the burden of relying on one person in order to do something with our project.
SSH security problems
We had a lot of issues with giving all team members access to our DO droplet, and allowing team members to ssh into the machine. We did follow the steps of adding our personal ssh keys to our DO team project, but somehow developers were still experiencing a Permission denied (publickey) error. Even when we ssh'd into the droplet from the DO platform, and peeked into the ~/.ssh/authorized_keys
, we could see that the SSH keys were correctly copied over, but still we weren't able to SSH. We ended up manually copying the ssh_keys over to our droplet, and since we didn't want to do this every time the droplet was created, we got help from ChatGPT to create a script, which copies the ssh_keys stored in the project repo into the virtual machine, when booting it up. This method is defined in our Vagrantfile
:
def fetch_digitalocean_ssh_keys(token)
uri = URI("https://api.digitalocean.com/v2/account/keys")
request = Net::HTTP::Get.new(uri)
request["Authorization"] = "Bearer #{token}"
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
http.request(request)
end
if response.code == "200"
keys = JSON.parse(response.body)["ssh_keys"]
keys.map { |key| key["public_key"] }
else
puts "Failed to fetch SSH keys: #{response.body}"
exit(1)
end
end
Although this seemed like a weird workaround, this was a huge bottleneck for implementing the CI/CD integration, and this was the solution that worked for us in order to allow developers and the Github Action to be able to SSH into machine and build the Docker images. We also tried to google the issue, and found that there were several people who were experiencing similar issues, and therefore we opt'ed for this solution.
Article addressing the problem.
Environment variables
https://github.com/Only-Smiles/DevOps-2025/wiki/Setting-environment-variables-for-the-project
Refactoring Our Frontend and API Endpoints into One Application
Previously, we had separate Ruby applications for our frontend endpoints and API endpoints.
This resulted in a lot of code duplication, where both frontend and backend endpoints performed the same operations but returned different responses (JSON for API and redirects for the web frontend). Additionally, SQL queries were directly embedded in all of our endpoints, which we wanted to refactor into a dedicated database (DB) class.
After refactoring, we now have an API Controller and a Web Controller. Both controllers call functions from our DBHelper
class to perform operations such as retrieving messages, getting user IDs by username, etc. We also implemented an AuthHelper
, which is used for all authenticated functions such as login, registration, and more.
Create automatic releases and add a development branch
We added a Github action that automatically creates a release whenever a pull request is merged into main. In order to not have too many releases due to a high number of pull requests (~10 per week right now), we add a development branch. All changes are first merged into the development branch (after >1 approval) and once a few features have been accumulated, they are merged into main.
How much do we as a group adhere to the three ways of the DevOps handbook
Flow
We track all of the work that needs to be done in Github issues, so that it is visible to everybody what needs to be done and who is working on what. Furthermore, the development branch for a feature is always linked to its issue. Typically, each person picks up one issue at a time, unless two issues are closely related. The idea of "reducing batch sizes" is naturally implemented by the structure of the course. We develop our product iteratively, without aiming for perfection the first time around. This leads to faster deployment of changes as well as more opportunities for learning. This does not mean that each commit is tested and deployed into production immediately since some of our commits produce non-working results. A person works on an issue from start to finish, thus reducing the handoffs. So far we have not done any analysis into discovering bottlenecks in our process. This will come later on in the course.
Feedback
Feedback on features is currently provided in two ways, (1) automatically by a test script upon every push and (2) manually by at least one reviewer for each pull request. Swarming is a tactic that we plan of applying in the future, but does not really apply to what we have done so far. Quality is pushed close to the source by having a small self-organising team that does not rely on outsiders.
Continual Learning and Experimentation
Not sure how relevant this is to us since it is mostly company-related. However, our team works similarly to a "generative organisation" where responsibility is shared and new ideas are welcome.
Week 4 - 21.02.2025
Implementing API for MiniTwit
This week, we focused on creating API endpoints in our Ruby application that return JSON responses along with the correct HTTP status codes.
To manage the workload, we created issues for each endpoint being tested in minitwit_sim_api_test.py
, allowing us to split the tasks efficiently.
We initially created a separate Ruby application for the API, acknowledging that this would lead to code duplication between the frontend and API endpoints. However, our plan was to prioritize getting the API up and running before refactoring. The reason for this prioritization was to ensure that the API was fully functional before the simulation started, rather than spending too much time perfecting the architecture, which could have delayed our API implementation.
We succeeded in this approach and managed to pass all of our tests.
Choice of CI/CD system
For the scope of this project Github Action suffice, it is a well known tool with lots of online tutorials to follow. It is free for our public repository. And it was also mentioned in the course. It matches our OS our langauge of choie as well as provide build secret key storage and live logs. And is easily integrated with our exsisting choice of version control platform.
- These variables are passed into the vagrant. Meaning they are stored in our deployment/droplet on DigitalOcean. export DOCKER_USERNAME=<your_docker_hub_username> export DOCKER_PASSWORD=<your_docker_hub_password> export DIGITAL_OCEAN_TOKEN=<your_digital_ocean_token>
vagrant up --provider=digital_ocean
- These secrets also needs to be stored in the Github repository. Note that the username should not be the email. SSH_host is only the ip withhout http or port number.
DOCKER_USERNAME username for hub.docker.com DOCKER_PASSWORD access token for username for hub.docker.com SSH_USER the user as whom we will connect to the server at DigitalOcean, default is root SSH_KEY the private SSH key we generated earlier (not the public key, if you followed the instructions it should be located at ~/.ssh/do_ssh_key) SSH_HOST the IP address of the server (or DNS name) we created on DigitalOcean, which you noted down earlier.
Needed files for the project can be found under https://github.com/itu-devops/itu-minitwit-ci Mainly ./itu-minitwit-ci/.github/workflows and fils mentiond within it.
Week 3 - 14.02.2025
We started by creating issues for the different endpoints.
We considered using GitHub Pages as an alternative to Digitalocean for hosting, but GitHub pages is only for static material and we also had group members that had used Digitalocean before.
Implement an API for the simulator in your ITU-MiniTwit
We split up in pairs, with one working on deployment and one on implementing the endpoints. This ended up being harder than expected because we had issues just getting the test script to run properly. Specifically, the check for the request coming from the simulator always fails.
We split our repository into frontend and API, which means there is a bit of code duplication. The reason for this is that the simulator wants status codes and json responses but our frontend returns http. We took that opportunity to also restructure the files, so that each endpoint is in its own file and can be implemented independently.
Laurids implemented the register and fllws endpoints and that is where we ended for the week, the rest of the endpoints will be implemented next week.
Vagrant with local virtual machine
We started by creating a Vagrant file that sets up a local virtual machine, installs the necessary dependencies required to run our program, and executes it. We based our Vagrant file on the code provided in the exercises with only minor modifications. Since everyone in our group except one person uses macOS with an ARM64 architecture, we couldn't use VirtualBox. Instead, some of us used UTM, while others opted for VMware.
To use UTM, we used this installation guide. To use VMware, we used this installation guide.
After successfully configuring a Vagrant file that could spin up an Ubuntu virtual machine and run our web server locally, we decided to refactor it into a hosted virtual machine using DigitalOcean.
Vagrant with Digital Ocean
Choosing DigitalOcean was an easy decision for us since the GitHub Student Pack provides $200 in credits. From the course material, we knew that our web server would need to handle a high volume of traffic, and DigitalOcean allows us to easily scale up our web server to accommodate more requests.
We also explored the DigitalOcean API and web portal and found their documentation and UI very intuitive and easy to follow.
To set up our Vagrant file, we used a mix of DigitalOcean's official documentation and LLM-generated guidance. Since most of the heavy lifting had already been done when creating the local Vagrant file, we quickly managed to get a working VM running through the API.
Vagrant with a Restricted (Floating) IP
Note: DigitalOcean has renamed from "Floating IP" to "Restricted IP," so I will refer to it as a Restricted IP, even though the project work instructions on GitHub still call it a Floating IP.
One issue with our previous approach is that we receive a new public IP address every time we deploy a new version of MiniTwit. Another issue is that deploying a new version requires us to destroy the currently running VM and create a new one, leading to downtime where users can't access MiniTwit.
To solve this, we first created a Restricted IP in the DigitalOcean dashboard and assigned it to our running VM. Now, when we deploy a new version, we simply create a new VM and use the DigitalOcean dashboard to reassign the Restricted IP to the new instance.
To avoid doing this manually, we automated the process in our Vagrant file using the DigitalOcean API. Now, when we run vagrant up, it does the following:
Creates a new droplet with a unique name: webserver-#{Time.now.strftime('%Y%m%d%H%M')}
.
Checks if a Restricted IP already exists.
If not, it creates a new one via the API.
If it does exist, we reuse the existing one.
It reassigns the restriced IP to the new VM and deletes the old one VM.
This eliminates downtime and ensures that the IP address remains the same across deployments.
Week 2 - 07.02.2025
Refactor ITU-MiniTwit to another language and technology of your choice.
Brainstorming sesh
We started a brainstorming session where we listed up different tech-stacks that we could use. The following stacks were put on the board:
- Julia
- Rust
- Express
- Ruby
- Nim
- Elixir
- Go
Everyone got to research the programming languages for 10 minutes before we assigned stars to the languages.
We used a method from Software Architecture course where we all got 5 stars that we could distribute to the languages as we'd like. It was important to us, that the language chosen was stable and there was extensive documentation for it. We used this for reference.
After placings stars we eliminated languages bottom-up, leaving us with only two to choose from:
- Ruby with Sinatra
- Pros: Lightweight. Simplicity and productivity. Helge recommended it. No-one has explored it before, and we'd like to learn.
- Cons:
- Typescript/Javascript with Nest.js and Express.js
- Pros:
- Cons: Everyone has tried it before and we'd like to learn something new
We discussed pro's and cons for the two languages and voted. The results were:
- 5 in favor of Ruby
- 1 in favor of Express
Given this, we decided to go with Ruby for backend and use Sinatra for the framework.
Refactoring from Python to Ruby
We decided to refactor the code together by bringing the code up to the screen and then all of us sitting and refactoring the code together. However we did not manage to do much progress, and decided to split up and work further on the refactoring individually or in pairs. We use discord for communication, so we decided that we will keep eachother up to date by communicating what we've done.
Known issues related
The Mac users had issues with ensuring that the correct Ruby version was used. We discovered that an older Ruby version already being installed on their laptops, and that the path was looking as this version instead of the new version installed by homebrew. In order to solve the issues, we updated the path on our laptops to point to the correct path in the homebrew folder.
Branch naming convention
We discussed that we should keep some sort of branch naming convention. We used this article for naming conventions. We didn't end up choosing any specific syntax, but we agree that we should have some kind of structure to link our branches to github issues.
Week 1 - 31.01.2025
1. Adding Version Control
Added version control with Git and setup branch protection for main.
2. Try to develop a high-level understanding of ITU-MiniTwit.
Done
3. Migrate ITU-MiniTwit to run on a modern computer running Linux
We added Poetry to our project to manage packages in our project and added all dependencies dependencies We recompiled flag_tool with gcc.
To convert minitwitt, we used 2to3 which removed a uncesesarry import and added parenthesis to a print statement.
We use shellcheck to lint check control.sh and fixed the warnings. We also used dos2unix to fix formatting issues from Windows