Skip to content

Google Summer of Code Proposals 2021

Gaurav Mishra edited this page Jun 24, 2021 · 46 revisions

GSoC 2021 Run Students and Topic Links

Name Topic Link
Shruti Agarwal UI https://github.com/Shruti3004/fossology/wiki
Avinal Kumar Build system https://github.com/avinal/fossology/wiki
Aman Dwivedi UI https://github.com/Aman-Codes/fossology/wiki
Shreya Singh Minerva https://github.com/SinghShreya05/fossology/wiki
Sarita Singh Scancode https://github.com/itssingh/fossology/wiki
Omar AbdelSamea Microservice https://github.com/OmarAbdelSamea/fossology/wiki
Kaushlendra Pratap Copyright https://github.com/Kaushl2208/fossology/wiki

Intro

We from the fossology project would like to apply for GSOC. Please see two main resources for finding out more FOSSology in general:

Meetings: Checkout the Meetings table

Interested in Application? - Getting Grip

If you are interested in an application - great! We encourage your application. So the question is how to get started with the topic, just a few points:

Examples from Last Year

We were awarded three slots last year, please see here what was the result of it:

Also - very much fun - There are some YouTube videos created:

Mentors

So far, the following users have opted themselves

  • @ag4ums (Anupam)
  • @GMishx (Gaurav)
  • @shaheemazmalmmd (Shaheem)
  • @mcjaeger (Michael)
  • @viv9k (Vivek)
  • @NicolasToussaint (Nicolas)
  • @sjha2048 (Sahil)
  • @hastagAB (Ayush)
  • @vasudevmaduri (Vasudev)
  • (Klaus Gmeinwieser)

Interested in becoming a mentor? please reach out to us!

Topic Proposals

Following are the topics from last year. Please edit this wiki/reach out to us to add more proposals for GSoC 2021.

  1. Atarashi: Generating OSS License dataset
  2. Migrating more pages to new UI
  3. Copyright False Positive Recognition with ML
  4. Migrate from Travis Github Actions
  5. Integration of Open Source Review Toolkit
  6. Integration of Scancode Toolkit
  7. Integration of Reuse Project
  8. Making FOSSology architecture more microservice friendly

Atarashi: Generating OSS License dataset

Goal: To create/generate OSS License dataset for Atarashi

To implement any Machine learning/Deep learning algorithm we need a better and bigger dataset of SPDX Licences. But unfortunately, there exists no such dataset for open source licenses on the web.

There is a loose implementation of n-gramming different permutations and combinations of license text paragraphs. Ref: SPDX-OSS Dataset. This method needs further improvement to make the dataset more accurate and realistic.

Few suggested improvements:

  • Shifting from txt files to SPDX JSON endpoint
  • Differentiating License Header from Full Text
  • Adding FOSSology Nomos agent STRINGS.in regex in dataset creation
  • Apart from these we strongly encourage students to pitch in some new ideas/methods/algorithms to achieve more accurate results.

Resources:

Category Rating
Low Hanging Fruit -
Risk/Exploratory **
Fun/Periphial **
Core Development **
Project Infrastructure -

Migrating more pages to new UI

FOSSology already have template based UI with the help of Symfony-Twig. In addition, FOSSology is using jQuery-UI, datatable, etc. to generate the pages. These are running for long and FOSSology needs a fresh view. Idea is to rewrite the UI using new technologies like Angular, Bootstrap, Vue.js or ReactJS. The twig-based templating could be considered for reuse.

The FOSSology UI initiated very HTML-90s style which works but does not look good today. A simple HTML has also the advantage that it runs from a variety of machines and different platform. Moreover, some of the PHP scripts even do not return HTML but just character streams. Today, the user base of FOSSology has grown and many persons use the Web UI. The goal of this project is to consistently provide a new UI technology to FOSSology which help to build more modern pages.

We have already worked on few pages of UI using bootstrap, Look at PR - 1774. This can be referred as well with Angular Or ReactJS.

And last year, Sahil proposed a design which can be referred for a fresh look. Also, Vivek has created a proposal wiki which can give a good start into new approach.

Category Rating
Low Hanging Fruit -
Risk/Exploratory *
Fun/Periphial *
Core Development ***
Project Infrastructure -

Copyright False Positive Recognition with ML

Fossology has copyright scanner to scan for the new copyrights from a file/packages. The current scanner findings are based on RegExp and has few false-positive findings as well.

The goal of this project is to find a machine learning technique/s that shall improve the copyright finding by detecting the false-positives.

Category Rating
Low Hanging Fruit -
Risk/Exploratory ***
Fun/Periphial **
Core Development ***
Project Infrastructure *

Migrate from Travis to Github Actions

The FOSSology project needs a continuous integration system to automate builds and tests. Since the beginning on the github, FOSSology has used only the free travis service for OSS projects.

Now, it seems that using the Github Actions as general workflow automation will provide more versatile ways for implementing concrete continuous build and test services. As such, the current contious integration builds implemented with travis would be required to be migrated to Github Actions. In addition, it would be desirable to update the continuous build targets and used platforms to most recent versions.

Category Rating
Low Hanging Fruit -
Risk/Exploratory *
Fun/Periphial *
Core Development *
Project Infrastructure ***

Integrating Open Source Review Toolkit

Build systems fetch the required dependencies (library/artifact) for a project while building the project. Its important to get an insight of these dependencies for license compliance check.

The OSS Review Toolkit is an open source project helps to find dependencies in a project.

the goal of this project is to render the project dependencies created by ort and display those in the fossology-UI. Dependencies can be scheduled directly from the UI and scan with fossology.

Category Rating
Low Hanging Fruit -
Risk/Exploratory -
Fun/Periphial **
Core Development ***
Project Infrastructure *

Integrating Scancode Toolkit

The scancode toolkit is a very established license scanner similar to nomos or monk. It implements a number of different approaches and integrates these into one application for identify and classifying license relevant content in packages.

The basic idea is to use the command line interface from the scancode package in order to be called right from the fossology application. Fossology will pass a single file and takes the result from the scanscode command line call.

Major work will be required to adapt the existing integration of atrashi, in order reuse the efforts for a same integration with the scancode.

Important links:

Category Rating
Low Hanging Fruit **
Risk/Exploratory *
Fun/Periphial **
Core Development **
Project Infrastructure *

Integration of Reuse Project

Integrating REUSE with fossology

Copyright and licensing is difficult, especially when reusing software from different projects that are released under various different licenses. REUSE make it easier for you to declare the licenses under which your works are released, but they also make it easier for a computer to understand how your project is licensed. This specification defines a standardized method for declaring copyright and licensing for software projects. REUSE also helps in creating a bill of materials with just one simple command.

The goal of this project

  • check license and copyrights for each files of package.
  • List of files without licensing information, can to notified to the user of that package.

Important links

Existing issues:

  • Integrate REUSE standard: #1592
  • Support the SPDX-FileCopyrightText keyword explicitly in the copyright agent #1513
  • Detect 'foobar.license' files, as part of RESUE.Software standard #1833
Category Rating
Low Hanging Fruit -
Risk/Exploratory **
Fun/Periphial **
Core Development ***
Project Infrastructure -

Making FOSSology architecture more microservice friendly

FOSSology is designed in modular fashion but it certainly does not follow micro-service architecture. For example, if there is a change in monk's logic, the whole source code has to be built again and installed. Whereas in micro-service architecture, only monk needs to be built and installed/deployed.

Existing behavior: In the current state, FOSSology is capable enough to run in a cluster mode. It can connect to other nodes in a cluster over SSH and execute agents. But it is not very flexible and has following drawbacks:

  • When adding or removing a node, scheduler needs to be restarted.
  • Every node needs to have same set of agents.
    • For example, you can not have one machine running only monk and one running only copyright.
  • Because of the current version check mechanism on scheduler, for a single change, the whole code base needs to be redeployed to get new version string.
  • The current architecture requires fo-postinstall to run on every new deployment to ensure correct DB schema and migrations.
    • Even though it is not a big push back, but it causes problems for some implementations. See #1841, #1818 and #1809.

Goal: When the FOSSology follows more microservice friendly architecture, it can be deployed on Kubernetes and be able to scale-out or scale-in based on demand.

Resources:

Category Rating
Low Hanging Fruit -
Risk/Exploratory ***
Fun/Periphial **
Core Development **
Project Infrastructure ***

Meetings table

All the GSoC calls sorted on topics. Keeping the meetings open for everyone to join.

Topic(s) Timings Meeting link ICS
General meeting Thursdays 15:30 - 16:30 UTC Microsoft Teams .ics
FOSSology UI Project Wednesdays 15:30 - 16:30 UTC Microsoft Teams .ics
Atarashi – Minerva Wednesdays Fridays 12:30 - 13:30 UTC Microsoft Teams, jitsi .ics
Copyright False Positive Detection Using ML
Making FOSSology architecture microservice friendly
- - - -
New Build System and improving CI/CD workflow Fridays 11:30 - 12:30 UTC Microsoft Teams .ics
Integrating ScanCode Toolkit Tuesdays 09:30 - 10:00 UTC Microsoft Teams .ics
- - - -
Clone this wiki locally