System Architecture - Serendipitously/pfive GitHub Wiki

#System Architecture

PFive will be a data-driven web application to improve the browse and search capabilities of ex-hentai.org.

Overview of technologies

Core App

Frontend: React.js

Realistically we can use any modern single-page app frame work (Angular.js, Angular2.js, Ember.js, React.js, to name a few). I'm leaning towards React because I haven't used it in depth before and it would be a good learning experience. The front end is not going to be particularly complicated, pretty much the filter page will be the only non-trivial element, so I'm not worried about the front-end scope getting out of hand. My Dad is currently researching React for his own company so he will be a good source of examples and advice to jump start us.
Backend: Django (Python) server backend.

Django is a fully mature python-based server framework. We use it at work and I'd like to make a full application with it from the ground up to learn it better. There are plenty of tutorials out there for setting up a Django RESTful backend. Django Rest Framework is another open-source framework that makes building a RESTful api on Django even easier. Following their quickstart guide.
Job Queuing: Celery

Celery is an open source python library for defining and scheduling server tasks. We use this at work for several important tasks and I'd like to learn more about it. Celery will let us schedule and automate the process of scanning ex-hentai and farther down the line we can use it for other ops tasks (like emails, db clean up, etc).
Database: PostgreSQL Database

Postgres is the preferred database to use with a Django server, and Django comes with excellent integration with it. It's likely we will use the default SQLlite database that Django comes with out of the box for a bit but I'd like to switch over to Postgres soon. Also Heroku comes with Postgres integration by default anyways so it will make deployment easy.
Search and Filtering: Elastic Search

Elastic Search is an open source, scalable, document indexing and processing framework build in Apache Lucena. It is free if we host our own (which I'm sure we can do on Heroku). Elastic Search will take care of the text matching, fuzzy searching, and indexing of the galleries for us, however we will still need to design the data model that uses it. ES uses its own JSON based query language that we will have to learn. This looks like a promising tutorial.
HTML Parsing: BeautifulSoup

Beautiful Soup is a python library for easily parsing and navigating html pages from an external source. It appears to provide a very simple interface to search the DOM tree for specific elements and return data form them. This looks like exactly what we need for the sync tasks.

Support Services

Deployment: Heroku AWS Hosting

Heroku is a virtual server abstraction on top of Amazon Web Services EC2 instances. It is industry standard for startups, has excellent integration with GitHub, Postgres hosting, and CircleCi which will make automating public deployments very easy. Heroku lets user accounts create free Heroku instances with limited ram, processing speed, and number of database connections, but these instances will be perfectly adequate for our needs.
Continuous Integration: CircleCi

CircleCi is a hosted continuous integration solution with excellent integrations with GitHub. We use it for our CI and deployment needs at work and I use it on all my personal projects now. CircleCi is very, very easy to set up and will let us offload running our tests and deployments into the cloud. You can link your GitHub account to CircleCi and get 1 Container for free, which will be fine for our small project.