Home - OmarAbdelSamea/fossology GitHub Wiki
Making FOSSology architecture microservice friendly
- Microservice Architecture
- Improvements over old cluster install
- Separate agents
- Docker Images
- Too many agents ... Too many services
- Migrating configuration from local files to shared key-value pair storage
- Available agents in microservice
- List of Kubernetes manifests
- List of Dockerfiles
- Pull Request
- Known issues and drawbacks
- Acknowledgments
- Contact Information
FOSSology is an open-source license compliance software system and toolkit. As a toolkit, we can run license, copyright, and export control scans from the command line, and as a system, it provides a database and web UI to give a compliance workflow.
FOSSology is designed in a modular fashion but it certainly does not follow microservices architecture. If there is a change in an agent's logic, the whole source code has to be built again and installed. Whereas in microservices architecture, only this agent needs to be built and installed/deployed.
And that is what the project has accomplished. with the use of the latest cloud technologies like Docker and Kubernetes, FOSSology could be installed on cloud in a microservice way.
- Modules are installed in containers instead of VMs which save space, time and allows to use all modern cloud technologies like Docker and Kubernetes and CI/CD tools like Jenkins, Travis CI and GitHub actions.
- The ability to install agents separately and install only required agents. the system identifies available agents and modify the scheduler configuration and UI accordingly
- Easy installation using Kubernetes, just with simple
kubectl
command. the cluster will be up and running in a couple of minutes vs the old install, which required creating VM for each module and establishing SSH communication between all machines, which would take a couple of hours. - Easy scaling in or out using Kubernetes services, any agent could be scaled up when need dynamically using some simple commands.
- Key-value pair storage
etcd
instead of conf files in every container. which will give one shared place for configurations data used by agents and scheduler and the change needs to be applied once vs in the old cluster install all conf files needs to get the same change inside each VM which would be a hectic process that requires some time.
All FOSSology installation methods require installing all agents, you can't update, delete or deploy a single agent you need to use all agents as a single module. and with more than 25 agents this would be a hectic process to update a single agent. With the new architecture, every agent is treated as a single module. only required agents could be installed which would save time and resources. With the use of Kubernetes, agents could be scaled in or out. With every agent added or deleted from the cluster no need to restart the scheduler, all configurations will be reloaded automatically. FOSSology scheduler was designed to test the agent's host machine before spawning the agents and running the job, with the new architecture the scheduler is identifying the agent installed in the host and spawn each agent from its host.
To separate agents, fossology debian packages are used. using fossology fo_debuild
command-line tool, FOSSology is packed into debian packages. after fo_debuild
finishes building the packages and with the use of docker multistage build, only the packages are copied into the image binging down the image size from 1.5Gb to only 15Mb. this is the first dockerimage fossology/packages
this is used as a base image and it contains all debian packages agents, scheduler, web, common etc..
FROM debian:buster-slim as builder
LABEL maintainer="Fossology <[email protected]>"
...
...
...
RUN ./utils/fo-debuild --no-sign --no-tar
FROM scratch
WORKDIR /fossology_packages/fossology
COPY --from=builder /fossology_packages/fossology/packages /fossology_packages/fossology/packages
to create the other images scheduler, agent, web, etc.. fossology/packages
is used as base image for a multistage build and the appropriate debian package is copied to the container along with fossology-common
package which contains all the libraries.
FROM fossology/packages:latest as builder
LABEL maintainer="Fossology <[email protected]>"
...
...
...
COPY --from=builder /fossology_packages/fossology/packages/fossology-common_*_amd64.deb .
COPY --from=builder /fossology_packages/fossology/packages/fossology-ununpack_*_amd64.deb .
To avoid using a service for each agent, which would be more than 25 services if all agents are installed, headless services are used, giving a single service that will handle all agents and give each agent a unique DNS.
For microservice architecture, each agent has its conf file inside its container. FOSSology hosts need to be hardcoded in the scheduler container before deployment. To solve this, all conf files are added to a key-value pair database. The selected DB system is etcd for each new agent added, it interfaces with etcd using RESTful API. The agent will open its conf file and start making a proper PUT request to add configuration in etcd and the agent will add the host details.
- Ojo
- Copyright
- Ununpack
- Wget_agent
- Nomos
- Adj2nest
Module | Manfists |
---|---|
Database | 1. db-statefuleset 2. db-nodeport 3. db-presistentvolumeclaim 4. db-secret |
Scheduler | 1. scheudeler-deployment 2. scheduler-cluterIP 3. schedueler-presistentvolumeclaim 4. scheduler-configmap |
Etcd | 1. etcd-deployment 2. etcd-nodeport 3. etcd-presistentvolumeclaim |
Ojo | 1. ojo-deployment |
Copyright | 1. copyright-deployment |
Nomos | 1. nomos-deployment |
Adj2nest | 1. adj2nest-deployment |
Wget_agent | 1. wgetagent-deployment |
Ununpack | 1. ununpack-deployment |
Web | 1. web-deployment 2. web-loadbalancer |
Agents Service | 1. agents-headless |
Module | Dockerfile |
---|---|
Packages base image | Dockerfile.pkg |
Scheduler | Dockerfile.scheduler |
Ojo | Dockerfile.ojo |
Copyright | Dockerfile.copyright |
Nomos | Dockerfile.nomos |
Adj2nest | Dockerfile.adj2nest |
Wget_agent | Dockerfile.wgetagent |
Ununpack | Dockerfile.ununpack |
Web | Dockerfile.web |
- feat(core): Microservices Architecture #2086
- docs(microservices): Intro & reports weeks1 - 4 #3
- docs(microservice): added weekly reports 5 - 9 #23
- docs(microservices): added week 10 and setup #28
- Although containers same a decent amount of space compared to VMs, but with separate agents more space is used as each agent has all libraries and dependencies installed in its container. The old build system ununpack and adj2nest agent are in the same package, which leads to using the same image for both containers, but this issue is solved in the new build system developed in GSoC'21.
- Scheduler doesn't tolerate possible errors in configuration retrieved from etcd which would lead to some errors when there's missing data.
- If a new agent is added when etcd is not running the configuration won't be added which in return would make the scheduler not aware of this agent.
- fossy user has read write permissions on
/root/.kube
folder inside the scheduler container to be able to use kubectl commands from inside the container to communicate with the agents.
Tasks | Completed | Remarks |
---|---|---|
Dockerfile template | ✔️ | Dockerfiles for all modules and 6 agents |
Separating agents | ✔️ | Separate container for each agent and Scheduler core code is modified to work with separate agents |
Kubernetes Manfists | ✔️ | Kubernetes deployments, services, and pvcs are provided |
Kubernetes Config Maps and Secrets | ✔️ | Kubernetes config maps for env variables and secretes for database username and password |
ETCD setup | ✔️ | ETCD Kubernetes deployments, service and pvc. Scheduler core code is modified to get data from etcd instead of conf files |
Docker and Kubernetes test | ❌ | Will be provided upon confirmation from the community on the initial version of the project |
Google summer of code is the best experience I had in my college years till now. I have worked on large scale and industry grade projects with talented people who are devoting their time for the Open Source community.
Special thanks for my mentors Gaurav Mishra, Michael C. Jaeger, Anupam Ghosh, Klaus Gmeinwieser and Vasudev Maduri. Thank you for the support not only in GSoC but even before, you helped me find my way into the Open Source community and achieve my goal of making a contribution that will make an impact.
Also, I want to give a special thanks to my fellow student developers. You're so talented and hardworking, I learned from all of you and I'm glad that I had the opportunity to be part of this community.