Feature Ideas - TheSentimentalists/TeamRepository GitHub Wiki
A tracker of 'ideas' for the final product - these will eventually be assigned into milestones to help us build an MVP and future iterations.
- FRONTEND: Input box to enter a URL
- FRONTEND: Button to submit URL
- API: Create a PUT /analyse endpoint to submit URLs, return a requestID
- FRONTEND: URL is submitted to the API
- API: Accept URL submitted, store it in the database, and pass to the backend
- API: Return the 'requestid' to the frontend
- BACKEND: Receive the submitted URL
- BACKEND: Verify the URL is correctly formatted
- BACKEND: Validate the results of the credibility score
- BACKEND: Call to GATE API to retrieve credibility score, source and category
- BACKEND: Store the GATE API results in the database
- BACKEND: Return the results to the database
- API: Create an /analyse/ endpoint to retrieve results
- FRONTEND: Poll the API for the result
- API: Retrieves the result from the DB to return to the frontend
- FRONTEND: Display a 'credibility score'
- BACKEND: Retrieve the article text from the URL
- BACKEND: Store the article text, summary and metadata in the DB
- BACKEND: Sentiment (Polarity/Subjectivity) Analysis
- FRONTEND: Display Polarity/Subjectivity separately
- FRONTEND: Progressbar-style 'Bias Score', Green/Red, 0-100
- FRONTEND: Receive an error from the API
- FRONTEND: Validate the URL format/show an error if it's not a URL
- FRONTEND: 'Processing' Icon/Loading Page
- FRONTEND: Display the article text alongside the result
- BACKEND: Store the article image link in the DB
- FRONTEND: Display article summary
- FRONTEND: Display the article 'header' image
- FRONTEND: Upload a document
- FRONTEND: Enter a text block
- FRONTEND: Document search
The users experience/what they can do, broken into three phases: input (entering the URL, doc or article info), processing (sending that input to the backend and waiting for a response), output (showing results returned from the backend).
- Enter a URL
- Validate the URL format/show an error if it's not a URL
- Enter a text block
- Upload a document (what format?)
- Show a 'processing'/loading page
- Display the 'bias score'
- Display the analysis outcome
- Display the article text
- Display the article image
- Search feature: user can type a work to check if it exists in the document
Summary of features we could implement in the backend, broken down into the three backend phases: ingestion (getting the article), analysis (process the article to get various results) and output (collate and calculate summary results to return to the front end).
- Receive a raw text block
- Receive a URL
- Validate the URL format/show an error if it's not a URL
- Extract the 'article' from the webpage
- Reject non-article type pages
- Retrieve a web page's contents
- Receive a document
- Convert document into processable text
- Analyse the URL with TextBlob, outputs are:
- Polarity (-1 to 1)
- Subjectivity (0 to 1)
- Analyse the URL Reliability with Gate Source Credibility API ("-1" if no rating available!), outputs are:
- URL Reliability (0 to 100)
- Rating Source (Media Bias/Fact Check, or others)
- URL Category (UNSpecified, left, left center, center, right center, right, pro science, Conspiracy, fake news, clickbait, Conservative, fake, Liberal, Satire, and unreliable.
- Check GATE API Generic Opinion Mining (sentiment analysis - polarity, score, sarcasm, etc)
- Extract a summary of the article using newspaper
- Extract keywords from the article using newspaper
- Cross-Analysis of Docs (Spacy library)
- Extract a URL for the article image
- Translate to a different language (get language as input)
- Generate a single 'bias score' based on output scores
- Return individual results scores
How it all glues together
- Use API Gateway to provide an interface to the backend
- Use a single lambda for the entire backend
- Use multiple lambdas and step functions to parallelise the backend
- Use polling to provide updates
- Provide websockets to allow streaming updates