Sprint 1: Back end Research - SD-Group-11/ml-frontend GitHub Wiki
Back-end Research Report
Preface: What is the backend?
In short - the backend within the context of a webapp describes whatever happens on the server side. It receives requests from the clients, and contains the logic to send the appropriate data back to the client. The back-end also includes the database, which will persistently store all of the data for the application.
The following report includes research into the backend of our Machine Learning platform. We will dive into detail of certain aspects such as database management to store user and application data, as well as some backend frameworks that may come in handy.
This research was done with the discussions had in mind, and ultimately serves to compliment the frontend decision of either a python or javascript/node (or hey, maybe even a mix of both!) based application.
Storing User data
The nature of the platform in mind would require a number of different data structures and information to be stored and processed. The obvious ones being usernames, passwords, contact details and any other user meta such as login history. In the second semester we will need to start thinking about storing user created and uploaded data.
Collecting Entry training and test data
Users are going to be inputting and uploading various types and formats of data to be processed within the platform. It would be good to bear in mind that it could possibly be a feature to store this data as well as the results for users to view and the future and maybe even comparing it to other users with similar uses as well - that’s where the database comes in.
What are we trying to achieve?
After reviewing the very basic requirements that our backend would require, it is our task to compare different possible options that are available for our implementation and would best suit it. As it is still the early phases of development of this great platform, it is difficult to pinpoint an exact solution to adopt. Due to the fact that a web-based platform has been decided, it does filter out some of the options.
A few popular database management systems have been selected for use in the platform:
MongoDB
~~MongoDB is an extremely popular database around the world and is focused on a noSQL approach, and is closer to a document database. It is a very flexible database allowing the user to define the structure and use of the data that is being stored, which is a key point for the platform at hand which may require storage of various types of data. The efficiency of MongoDB allows for quick query results. The simplicity of this database makes it very attractive, many see it as having one of the more simpler syntaxes. ~~
Of course these great features come at certain prices. Joining data from multiple locations is quite a complex and expensive process, which may lead to overcapitalisation deeming unsatisfactory results. With flexibility, comes responsibility! Failure to index data correctly may lead to extremely poor performance from the database. It is a challenge to define correct relationships for this database and minor flaws can result in duplicates which may cause corrupted data.
Pros:
Flexibility
Efficiency
Simplicity
Cons:
Difficulty to join
Indexing
Relationships and duplicates
EDIT:
This research was done before we heard the decision to use Django and django REST was going to be used; thus making a noSQL DBMS like mongoDB very difficult to use - so only focus on the RDBMS from here on.
PostgreSQL & MySQL
These two database management systems have been grouped together as they are both relational, as opposed to a document database such as MongoDB. Postgre and MySQL are two very powerful databases, used widely and could potentially fulfil every requirement of the platform. They enable data to be inserted, edited, manipulated and queried with ease, and make use of similar syntax - just what is needed.
MySQL:
-MySQL is practically the go-to database management system when it comes to web applications
-It is extremely efficient in many senses
-The aspect of reliability makes it a great option
-Due to the popularity of MySQL, there is tons of support, documentation and communities online to assist with almost every problem that could be encountered.
-
Grants access to the database, objects, and connections via roles and privileges.
-
BIG PRO - everyone in the group has experience with it from prior WITS courses (the MC project in 2nd year namely).
PostgreSQL:
-Postgre is optimal for handling complex tasks and data operations.
- Natively supports a rich variety of data types, including JSON, hstore, and XML. You can define original data types and set up custom functions as well.
-It offers more features compared to MySQL that could assist in making tasks lighter
-The structure is highly customizable which enables the developer to cater for all their needs.
- PostgreSQL doesn’t restrict the size of your databases.
- -Has table inheritance (Child table inherits column(s) along with all check constraints and not null constraints from one or more parent tables.)
-
More rigorous constraint checking.
-
(slightly biased but not yolo) The creators of django recommend postgreSQL_ (The Definitive Guide to Django, p. 15)_
Backend Frameworks
Apologies in advance if this next section is a bit all over the place - but given the schedule of 2nd semester being the time when we focus on the backend, this section will become more refined closer to that time. EDIT: django has already been selected as the fullstack framework.
Django
A full stack web framework that basically allows for python to run on the backend, and the framework then being able to run HTML, CSS and JS on the frontend. Several other modules such as the django rest framework would need to be used on top of it to gain full functionality of a modern web app. This would be the most ideal option if we were to be building all of our models in python, and would want to keep the language consistent throughout the project. Django supports both PostgreSQL as well as MySQL, so this makes it perfect for our use and an informed decision between the two will be make further on.
Node.js
~~The current god-tier web framework, node.js is an extremely versatile framework built on top of javascript. It is extremely scalable and would even allow for python scripts to be run through it (crazy right!). There are tonnes of free and open-source packages and documentation online that make using it a breeze. I’m sure it has been noted in the front-end research that it does run on the client-side which takes me to the next framework. ~~
Express.js
I thought I would mention express as it works directly alongside node in order for it to be able to run on the server side. It provides the necessary implementations and protocols in order to allow node to make calls to and from the database.
Wrapping Up
After reviewing the mentioned Database Management Systems, a definitive answer has not been formulated however there is a greater sense of clarity and understanding on how to approach fulfilment of the platform’s needs. There have been many strong points from each system, allowing there to be an element of comfort. Going forward in the development of the platform, the decision as to which system that will be chosen by the group will be much more informed after this research and will reduce overcapitalisation.
Resources:
https://blog.panoply.io/postgresql-vs.-mysql