Feature Ideas - TheSentimentalists/TeamRepository GitHub Wiki

A tracker of 'ideas' for the final product - these will eventually be assigned into milestones to help us build an MVP and future iterations.

Milestones (MoSCoW)

MUST

  • FRONTEND: Input box to enter a URL
  • FRONTEND: Button to submit URL
  • API: Create a PUT /analyse endpoint to submit URLs, return a requestID
  • FRONTEND: URL is submitted to the API
  • API: Accept URL submitted, store it in the database, and pass to the backend
  • API: Return the 'requestid' to the frontend
  • BACKEND: Receive the submitted URL
  • BACKEND: Verify the URL is correctly formatted
  • BACKEND: Validate the results of the credibility score
  • BACKEND: Call to GATE API to retrieve credibility score, source and category
  • BACKEND: Store the GATE API results in the database
  • BACKEND: Return the results to the database
  • API: Create an /analyse/ endpoint to retrieve results
  • FRONTEND: Poll the API for the result
  • API: Retrieves the result from the DB to return to the frontend
  • FRONTEND: Display a 'credibility score'

SHOULD

SHOULD SOON

  • BACKEND: Retrieve the article text from the URL
  • BACKEND: Store the article text, summary and metadata in the DB
  • BACKEND: Sentiment (Polarity/Subjectivity) Analysis
  • FRONTEND: Display Polarity/Subjectivity separately
  • FRONTEND: Progressbar-style 'Bias Score', Green/Red, 0-100
  • FRONTEND: Receive an error from the API

SHOULD LATER

  • FRONTEND: Validate the URL format/show an error if it's not a URL
  • FRONTEND: 'Processing' Icon/Loading Page
  • FRONTEND: Display the article text alongside the result

COULD

  • BACKEND: Store the article image link in the DB
  • FRONTEND: Display article summary
  • FRONTEND: Display the article 'header' image

WON'T

  • FRONTEND: Upload a document
  • FRONTEND: Enter a text block
  • FRONTEND: Document search

Frontend

The users experience/what they can do, broken into three phases: input (entering the URL, doc or article info), processing (sending that input to the backend and waiting for a response), output (showing results returned from the backend).

Input

  • Enter a URL
  • Validate the URL format/show an error if it's not a URL
  • Enter a text block
  • Upload a document (what format?)

Processing

  • Show a 'processing'/loading page

Output

  • Display the 'bias score'
  • Display the analysis outcome
  • Display the article text
  • Display the article image
  • Search feature: user can type a work to check if it exists in the document

Backend

Summary of features we could implement in the backend, broken down into the three backend phases: ingestion (getting the article), analysis (process the article to get various results) and output (collate and calculate summary results to return to the front end).

Ingestion

  • Receive a raw text block
  • Receive a URL
  • Validate the URL format/show an error if it's not a URL
  • Extract the 'article' from the webpage
  • Reject non-article type pages
  • Retrieve a web page's contents
  • Receive a document
  • Convert document into processable text

Analysis

  • Analyse the URL with TextBlob, outputs are:
    • Polarity (-1 to 1)
    • Subjectivity (0 to 1)
  • Analyse the URL Reliability with Gate Source Credibility API ("-1" if no rating available!), outputs are:
    • URL Reliability (0 to 100)
    • Rating Source (Media Bias/Fact Check, or others)
    • URL Category (UNSpecified, left, left center, center, right center, right, pro science, Conspiracy, fake news, clickbait, Conservative, fake, Liberal, Satire, and unreliable.
  • Check GATE API Generic Opinion Mining (sentiment analysis - polarity, score, sarcasm, etc)
  • Extract a summary of the article using newspaper
  • Extract keywords from the article using newspaper
  • Cross-Analysis of Docs (Spacy library)
  • Extract a URL for the article image
  • Translate to a different language (get language as input)

Output

  • Generate a single 'bias score' based on output scores
  • Return individual results scores

Infrastructure

How it all glues together

  • Use API Gateway to provide an interface to the backend
  • Use a single lambda for the entire backend
  • Use multiple lambdas and step functions to parallelise the backend
  • Use polling to provide updates
  • Provide websockets to allow streaming updates
⚠️ **GitHub.com Fallback** ⚠️