Google Summer of Code 2018 - HelikarLab/candis GitHub Wiki

Google Summer of Code 2018


by Rupav Jain

Table of Contents

Candis phase II


Following were the major goals to be achieved this summer:

  • Include more DNA Microarray IO handlers than just AffyMetrix CEL files.
  • User authentication.
  • Database-driven application.
  • Download CEL files from NCBI using entrez utility.
  • Documentation
  • Unit Testing - frontend
  • Unit Testing - backend

Work Done


Community-Bonding Phase

24th April 2018 - 13th May 2018

In this phase, I got to know more about candis by the mentors in details. Mr.Achilles and Dr.Akram mentioned they wanted Candis production ready by the end of GSoC programme. I was required to implement user authentication, make Candis a database-driven application, and get it deployed on a platform for everyone to use. Candis already had a feature to convert microarray gene expression data in the form of CEL files into equivalent ARFF files. These ARFF files are used to work upon by the Weka platform. This platform is being used for data analysis and predictive modelling. Mr.Achilles wanted me to implement a function to convert these ARFF files into pandas data frame which would then open an option to use python directly for the same Machine Learning tasks using scikit/keras/tensorflow. This was done in PR #98. Before merging anything in the codebase, I was required to integrate a continuous integration tool - 'travis' with candis. This would ensure that every commit is added to the codebase only after a successful travis build. We also needed testing tools to check if the app was working as expected. I chose PyTest for flask testing and Jest for ReactJs testing. Jest is the same tool used by Facebook (created the ReactJs library), so it already had a good support online. By the end of this phase I had set up my development environment and travis with test coverage tools successfully. I had also added a few test cases using Jest.

I-Phase: Adding Entrez

14th May 2018 - 13th June 2018

In this phase, I started implementing a feature which could download the CEL files from the NCBI. Mr.Achilles had already setup entrez module for the same purpose, I was required to enhance and complete this feature. I configured get-candis script which could install candis on a bare Linux/MacOS container/machine without any trouble. I added some basic UI features - delete stages from a pipeline, delete the whole pipeline, added defaults user settings etc. Refer #56, #51 and #52. Completed the Entrez feature in this phase. This was the layout/logic used in designing entrez utility in candis...

candis-entrez-layout

And the UI part for this utility consists of 2 modal pages - A formik form to search data and ReactDataGrid Data Grid for user to select and download one of the many CEL files available.

See the entrez feature in action:
candis-entrez-forms

In this period, there was a lot of discussion on database-driven application. We agreed upon using PostgreSQL as the database and sqlaclhemy as an ORM.

II-Phase: Connecting Candis to database

14th June 2018 - 13th July 2018

In this phase, I implemented user authentication, adding database support, making the endpoints private. For user authentication, I used JWT tokens authentication, instead of session management. I created tables for storing user data, candis pipelines(a user creates) and the response of each endpoint(error or successful response). Advantage of having user authentication is that user can have private pipelines. For this, I had to implement one-to-many relationship between users and pipeline tables. Refer PR #121 for more details. For making forms (without tears 😭) and data validation in ReactJS , I used Formik and Yup respectively.

User authentication in action...
candis-user-auth
Final-Phase: Finishing up - deploying!

14th July 2018 - 14th August 2018

In this phase, I tried deploying the application on DigitalOcean droplet. As per Mr.Achilles suggestions, I deployed the app with gunicorn as backend server, nginx as a reverse proxy backend server - redirecting requests from the client to the gunicorn application. For monitoring and controlling the processes, I used systemd script. The deployment was successful on a DigitalOcean droplet.

What I Learned

How to tackle problems, and write quality code!


Know more about candis and my experience in the first phase of GSoC'18: Wordpress blog. Get Started with testing: Medium Story.

Future Work


  • Documentation and making test cases for backend still needs to be done.
  • Currently, candis parses AffyMetrix CEL Files alone. More DNA Microarray IO handlers need to be implemented.
  • We have a function to convert ARFF data into pandas dataframe. Weka is currently being used for data analysis and predictive modelling of DNA microarrays. But, now with pandas dataframe, we can implement the same using python Machine Learning libraries like scikit. This way we can avoid using JVM, python-weka-wrapper and python-javabridge in candis.

Credits


It was a dream come true ✨ for me when I was given the opportunity to work for candis in Google Summer of Code 2018. I am extremely thankful to my mentors for giving me this opportunity. Mr. Achilles and Dr. Akram have been great mentors, guiding and supporting me through this remarkable journey. I am indebted to Mr. Achilles for sharing his knowledge and bringing me forth in the field of web-development. It was Mr. Achilles who introduced me to a very useful editor, vim. Mr. Achilles focusses on code quality, scalability of an application and using the different conventional principles (like the 12 factor app, YAGNI, KISS) while making a web application. This helped me to achieve writing quality code and structuring codebase as similar as possible to what was setup by Mr. Achilles in candis before I started working. The faith instilled in me by my mentors gave a boost to my confidence, and I was able to tackle and solve easy to hard issues in the project. Special thanks to Dr. Akram for giving invaluable inputs on improving the user interface of candis.

⚠️ **GitHub.com Fallback** ⚠️