PFive Project Outline - Serendipitously/pfive GitHub Wiki

Purpose

The purpose of PFive (name subject to change) is to create a search engine that interfaces with exhentai.org and improves on the site's built in search functions, in order to better acquire results from the site that match the user's preferences.

Stages of Design

All design features herein can be roughly separated into four different categories, and will be marked with the appropriate initials, to indicate the urgency and immediacy of the particular idea:

  1. Function (FCN): These are parts of the design that are required for the project to work at all, and should be included in the first iteration of it. All unmarked requirements are assumed to be part of this categories by default.

  2. Feature (FTR): These are parts of the design not immediately required for testing or use by a closed group of users, but necessary if the site ever becomes publicly available.

  3. Reach (RCH): These are functionalities that are "good-to-have", and should only be considered either if it turns out the implementation will be effortless, or when the designers get really bored.

  4. Dream (DRM): These aren't realistic, but a man can dream (and fap to it).

Project Goals

To create a regularly updating, persistent database on a remote server, a search algorithm to parse through it, and a webpage to deliver the content in a simple, clear interface, for use by members of Houkago Tea Time.

FTR: The database should be publicly accessible, with a reasonable amount of security, and should have a UI usable by people who aren't technically savvy.

RCH: To monetize this project, through advertisement, donations or by selling the algorithm and idea to an established adult site.

DRM: To incorporate other adult sites within the search results, to produce a comprehensive porn search engine.

Search Functionality & Algorithm

The search algorithm is designed to address the mutable and flexible nature of human fetishes, so that the proportion of content delivered that is relevant to the user's interests is delivered.

Existing Search Functions

The existing exhentai.org search bar can exclude by rating and category (doujinshi, manga, CG etc.) and search via keywords; pfive must at a minimum be able to mimic this ability

The Include/Exclude Concept

The bulk of the algorithm involves searching through exhentai's numerous and actively curated tags, through four different categories that are defined by the user, as below:

FTR: The UI should include a set of popular or recommended tags for each category, so users can add to their search without being intimately familiar with the exhentai system

FTR: There should be a "Junda mode" tickbox next to the recommendations. When ticked, the most included tags and most excluded tags swap places.

RCH: The site should include a list of tags used in the database, and have some sort of auto-complete ability so users can search for tags that they vaguely know of.

"Hard Include" (HI)

These are tags that must be found in the manga/doujin/etc. (henceforth "target") for it to appear within the search results. Examples of tags that a user might want to put under this category include "English" "[series name]" or "[author name]

There are two ways to tackle multiple HI tags; either to only deliver results that contain all tags specified (a logical AND), or to deliver results that contain any of the tags (a logical OR). Further discussion is required to determine which one is favourable.

RCH: The algorithm should be able to handle a combination of AND/ORing of tags, and have an interface that allows users to easily use this AND/OR function (DRM: without knowing anything about boolean logic).

"Soft Include" and "Soft Exclude" (SI/SE)

This is the main point of innovation within the whole project. The algorithm should look through a potential target's tags, and tally how many SIs and SEs it has. Each SI tag counts as +1, and each SE tag -1. The target will only be delivered if the final value is above a predetermined (FTR: user-defined) value.

RCH: The user should be able to define a weight for each tag that he decides to be SI or SE.

"Hard Exclude" (HE)

These are tags the user considers offensive, unreadable, uninteresting or irrelevant. Any target with one of these tags is automatically ignored.

An Example Search

To elucidate this convoluted algorithm in simpler terms, an realistic example of a user is presented below:

Jimmy is a bland, boring,vanilla-loving fapper, with a insufferable fetish for sapphic amorism (AKA yurifag). His power level is too low for original lewds, but he can read some Spanish. He offers the following search criteria:

  • Keyword "c89"

  • Stars: >4

  • HI: <english> <translated>

    HI (RCH): (<english> OR <spanish>) AND <translated>

  • SI: <yuri> <vanilla> <stockings> <futanari> <nakadashi> <uncensored> <loli>

    SI (RCH): <yuri (+7)> <vanilla (+4)> <stockings (+2)> <futanari (+5)> <nakadashi (+1)> <uncensored (+6)> <loli (+10)>

  • SE: <anal> <bakunyuu> <censored>

    SE (RCH): <anal (-1)> <big breasts (-4)> <censored (-1)> <urination (-5)>

  • HE: <cheating> <ntr> <scat> <yaoi>

  • (FTR) Threshold: +10

In this example, any English or Spanish translated gallery voted 4 star or more, and including the term c89 in its title, will be eligible. However, natively English galleries will not.

(RCH) Example of sets of tags that will pass

  • <yuri> <vanilla> <censored> (+7+4-1 = 10)
  • <loli> <urination> <uncensored> (+10-5+6 = 11)

Example of sets that won't

  • <yuri> <vanilla> <ntr> (no ntr)
  • <vanilla> <nakadashi> (+4+1 < 10)
  • <yuri> <vanilla> <big breasts> (+7+4-4 < 10)

UI

The user interface must at the same time be packed with these complex options yet be clear enough to be used by just some guy.

TODO: mock up samples of the UI done terribly in an unholy combination of MS Word and MS Paint.

Results

The results should be displayed in a table similar to the one exhentai natively produces, with the following additions:

  • Tags Hit: there should be a list of tags, (RCH) with the ones that are in the search highlighted
  • (RCH) Relevance: what is the score tally of SI and SEs
  • (RCH) Title Picture: to save bandwidth, the image is direct-linked to exhentai
  • Link to Exhentai Gallery: self-explanatory

(FTR) Additionally, the results should be sortable by:

  • Name
  • Date Uploaded
  • Relevance
  • Star Rating

TODO: sample results page

Additional Features

Here are a list of additional cool stuff that should be added to enhance the core search ability.

Bypass Sad Panda

A notification that the user must be logged in and able to use exhentai to access the results

FTR: a function to check whether the exhentai cookie exists

RCH: automatically add a cookie that allows the user to use exhentai if he is not already able.

(FTR) Save Searches

The user should be able to save different searches, either locally through cookies, or on the server

RCH: the user should be able to combine the results of multiple saved searches

(RCH) Subscribe to Searches

The user should be able to "star" or "subscribe" to some of their own searches. When the server updates it's database from ex-hentai it will check them against the "starred" searches for each user and give then a notification for it (either in the app or in an email)

User Authentication

Some sort of password should be used to prevent drive-by attacks by unscrupulous DDOSers.

FTR: Users should be able to make an account to save their preferences

RCH: Users should be able to log in with their exhentai accounts, and the info from their exhentai account should be used to enhance searches (see next section)

(RCH) So That's What You're Into (AKA automatic mode)

With consent of the user, his exhentai favourites are parsed, and the tags on his favourited doujins are tallied. A more advanced algorithm generates an edittable list of HI/SI based on frequency of these tags appearing. The algorithm must be weighted based on the type of tag as well as frequency. e.g. author tags might be worth more than fetish tags

(DRM) MAL compatability

The user enters their MAL username, and the program automatically generates SI and SE based on the user's watched list and the scores he gave them (DRMx2: with reference to his average score). For example, I'd see an awful lot of Oreimo doujins but no TTGL doujins.

(DRM) Update User

Through emails or some sort of messaging system, the user is informed when X number of new hits appear within his saved searches, with time-limited contact frequency.

(DRMx2) Automatic Collection

The site automatically generates download links or torrents for new results in saved searches, and delivers them automagically to the user via RSS or similar protocols, because collecting porn yourself is hard work.

TODO: Program Flowchart

TODO: Backend Design

⚠️ **GitHub.com Fallback** ⚠️