WebInterface Documentation - NewsAnalyseTool/Documentation GitHub Wiki
This is the documentation of the web interface. You can find the corresponding repository here.
- React (Typescript)
- Vite
- Scala Play
- sbt
- ReactiveMongo for Scala Play
For better understanding of the application you should know the basics of the Play Framework like package structure, application properties etc. which can be found here and here.
The backend provides a simple REST-API for the aggregation of analyzed data. The Requests reach the contoller layer, which takes care of returning the right response.
The logic for the aggregation is placed in the service layer of the backend which consists of a single AggregationService.scala
class responsible for aggregating the needed features into a response object which is then returned to the frontend through the REST-API.
One further layer is the repository which takes care of accessing the MongoDB through Reactive Mongo. Here we have defined certain queries for the databse like a query based on a date range. Furthermore this layer abstracts away the details like the number of databases or the collections which are accessed.
The core of the Backend is defined in the model layer which defines the representations of the objects handled in this application, e.g. AnalyzedDocument
, Source
, TrendResponse
.
Serving the ReactJS frontend app upon landing on the "/" path of the server was accomplished thanks to this tutorial which was slightly adjusted to our needs and defined scripts in the package.json
file.
Abstracting away the details for connecting to the actual databse using a repository was actually very helpful in our case because midway through the project where the backend app was already able to connect to one database we decided to create a distinct MongoDB database on the same server for each source type. With the repository pattern we just had to include the new databases in the repository class and handle the logic there with minimal changes in the other layers.
The model layer became a bit messy overtime and could be improved by separating the model in request/response entities and databse entities (e.g. DTOs - data transfer objects). Nevertheless in a backend with this size it is still quite clear but could become a problem if the API grows.
For the application to run on the server we have decided to extract the sensitive information like database connection credentials in a db.conf
under conf/
or just include the same information in the application.conf
file.
After that the application can be started with sbt run
for development purposes or if you intend to run it in production mode you need to start it with sbt runProd
which needs an application key defined in the application.conf
file.
During the startup the app bundles the frontend with the defined script from the package.json
and serves it when its done.
The frontend was developed with React in Typescript and uses the MaterialUI library for charts and some other small components. If you are wondering why almost every component has an associated .css file, even though we used the MaterialUI library, it is because it was added at the end of the project and was more the fix for another library we used to draw the diagrams before. The most components are therefor "clean" React components and import a stylesheet.
There are several components, but the most important ones are described below:
The root of all components is of course the MainComponent.tsx
. It not only contains all other components, but also handles requests to the backend and states that need to be shared between components.
The SelectionRowComponent.tsx
is what you see at the top of the frontend. It handles the selection of a time span and has the buttons to switch between the general data view and the trend data view.
When the general data view is selected, you will see a number of rows with the same structure in a kind of list. Each of these "entries" in the list is a NewsSourceElement.tsx
component. Each of them displays the data for a particular news source. It also contains the pie chart for the distribution of categories from that news source, and the donut chart for the distribution of sentiment.
This component is repeated in a list like the NewsSourceElement component, but it is displayed only when you have selected the trend data view. It consists mainly of a graph showing the evolution of the sentiment of a news source's articles.
- Python >= 3.11.6
- npm 10.4.0
If you just want to get a first idea of what the project looks like, you can use a test backend, which does not need any connections to a database, so it is quite easy to set up. It was originally developed so that the frontend could be implemented without waiting for the backend to be complete. Therefore it only sends back random data, but this is enough to show all functionalities. Since it was easiest to set up a small backend with Flask, you need Python to run it.
Follow these steps to set up and run the frontend:
- Clone the repo as usual
- Run
npm install
in theui
folder to install all the necessary dependencies - Go into the
test_backend
folder and create a python environment withpython3 -m venv venv
- Activate the virtual environment with
source venv/bin/activate
- Now, run
pip install -r requirements.txt
to install the nesessary dependencies - The backend can be started with
python app.py
- Start the frontend with
npm run dev
back in theui
folder
You should now be able to see the frontend in your browser with the url http://localhost:5173/
.
Inside the ui
folder you will find a config.json
file. Change the ip address and port to where the frontend can reach the API of the real backend. Starting the backend was already described a bit further up this page.
This is the definition of the API that the backend provides to the frontend. Both the endpoints and the expected responses have been defined so that both components can be developed independently. As the project evolves, this definition may be adapted to allow for additional functionality. All version of the API can be found in the sections below. Don't be surprised if the routes don't contain the version like /api/v2/...
. The first version was not implemented by the real backend, only by the test backend. So the second version, which will be the final one, is in a sense the only one really used.
This is the second version of the API specification.
In this version, a second endpoint was defined. The first endpoint (still) responses with general data for all news sources. The new endpoint returns the trend of positive, neutral and negative articles. In order to minimize the amount of computation in the front-end, the back-end still calculates and provides (partially redundantly) data to the front-end, which can then simply read the data from the JSON object.
The first endpoint provides the general data of all news sources in a specific time period.
The frontend will send the request to /api/data
. The start end end dates must be specified with parameters in the requested url and must follow the ISO-8601 date format (yyyy-mm-dd):
http://127.0.0.1:5000/api/data?startDate=2023-01-01&endDate=2023-12-31
The start date as well as the end date are always included in the interval. To request data from a single day, both parameters must be equal.
The JSON object returned by the backend must have a certain structure in order for the frontend to process it. (This structure is indifferent with the one specified in V1)
The type expected by the frontend has been defined using Typescript:
type Response = {
totalArticles: number; // number of articles from all sources
totalCategories: number; // number of categories from all sources
sources: { // list of news sources
name: string; // name of the source (e.g. "New York Times")
articleCount: number; // number of articles from this source
articlePerc: number; // percentage of articles from this source out of all sources
categoryCount: number; // number of categories from this source
posArticles: number; // number of positive articles from this source
posArticlesPerc: number; // percentage of positive articles from this source
neuArticles: number; // number of neutral articles from this source
neuArticlesPerc: number; // percentage of neutral articles from this source
negArticles: number; // number of negative articles from this source
negArticlesPerc: number; // percentage of negative articles from this source
categories: { // list of labels for articles or groups to which articles belong (e.g. "Politics", "Economy"); perhaps more detailed or specific to the source (e.g. "World War II" (detailed), "r/news" (specific to Reddit))
name: string; // the name of the category (see examples above)
count: number; // how many articles or posts belong to this category in the time period specified by the request
pos: number; // number of the articles with positive sentiment
posPerc: number; // percentage of positive articles out of all articles from this source in this category
neu: number; // number of the articles with neutral sentiment
neuPerc: number; // percentage of neutral articles out of all articles from this source in this category
neg: number; // number of articles with negative sentiment
negPerc: number; // percentage of negative articles out of all articles from this source in this category
}[];
}[];
};
{
"totalArticles": 135,
"totalCategories": 7,
"sources": [
{
"name": "Reddit",
"articleCount": 54,
"articlePerc": 40,
"categoryCount": 3,
"posArticles": 20,
"posArticlesPerc": 37.04,
"neuArticles": 14,
"neuArticlesPerc": 25.93,
"negArticles": 20,
"negArticlesPerc": 37.04,
"categories": [
{
"name": "r/news",
"count": 23,
"pos": 14,
"posPerc": 60.87,
"neu": 4,
"neuPerc": 17.39,
"neg": 5,
"negPerc": 21.74
},
{
"name": "r/politics",
"count": 23,
"pos": 4,
"posPerc": 17.39,
"neu": 9,
"neuPerc": 39.13,
"neg": 10,
"negPerc": 43.48
},
{
"name": "r/worldnews",
"count": 8,
"pos": 2,
"posPerc": 25,
"neu": 1,
"neuPerc": 12.5,
"neg": 5,
"negPerc": 62.5
}
]
},
{
"name": "New York Times",
"articleCount": 38,
"articlePerc": 28.15,
"categoryCount": 2,
"posArticles": 11,
"posArticlesPerc": 28.95,
"neuArticles": 10,
"neuArticlesPerc": 26.32,
"negArticles": 17,
"negArticlesPerc": 44.74,
"categories": [
{
"name": "Football",
"count": 18,
"pos": 8,
"posPerc": 44.44,
"neu": 2,
"neuPerc": 11.11,
"neg": 8,
"negPerc": 44.44
},
{
"name": "Elections",
"count": 20,
"pos": 3,
"posPerc": 15,
"neu": 8,
"neuPerc": 40,
"neg": 9,
"negPerc": 45
}
]
},
{
"name": "Tagesschau",
"articleCount": 43,
"articlePerc": 31.85,
"categoryCount": 2,
"posArticles": 13,
"posArticlesPerc": 30.23,
"neuArticles": 17,
"neuArticlesPerc": 39.53,
"negArticles": 13,
"negArticlesPerc": 30.23,
"categories": [
{
"name": "Sport",
"count": 11,
"pos": 4,
"posPerc": 36.36,
"neu": 2,
"neuPerc": 18.18,
"neg": 5,
"negPerc": 45.45
},
{
"name": "Berlin",
"count": 32,
"pos": 9,
"posPerc": 28.12,
"neu": 15,
"neuPerc": 46.88,
"neg": 8,
"negPerc": 25
}
]
}
]
}
This endpoint should provide data to display the trends of positive, neutral and negative articles from a news source over a time span.
The frontend will request the route /api/trend
. A possible route might look like the following:
http://127.0.0.1:5000/api/trend?startDate=2023-01-01&endDate=2023-12-31
First of all, we define the structure of the response JSON-Object with Typescript as follows:
type Response = {
source: string; // The source to which the data points belong
datapoints: { // List of all data points that will be displayed as the trend. One data point in that list is equal to a day
date: string; // The date of the data point
pos: number; // Number of positive articles at that date
neut: number; // Number of neutral articles at that date
neg: number; // Number of negative articles at that date
}[];
}[];
As you can see, the response is actually a list containing the trend data for all news sources. The frontend can then select the parts it wants to display.
[
{
"source": "Reddit",
"datapoints": [
{
"date": "2023-01-01",
"pos": 12,
"neut": 11,
"neg": 11
},
{
"date": "2023-01-02",
"pos": 13,
"neut": 9,
"neg": 1
},
{
"date": "2023-01-03",
"pos": 15,
"neut": 2,
"neg": 4
},
{
"date": "2023-01-04",
"pos": 4,
"neut": 7,
"neg": 8
},
{
"date": "2023-01-05",
"pos": 11,
"neut": 12,
"neg": 15
},
{
"date": "2023-01-06",
"pos": 14,
"neut": 0,
"neg": 14
},
{
"date": "2023-01-07",
"pos": 2,
"neut": 13,
"neg": 6
}
]
},
{
"source": "Tagesschau",
"datapoints": [
{
"date": "2023-01-01",
"pos": 14,
"neut": 12,
"neg": 10
},
{
"date": "2023-01-02",
"pos": 0,
"neut": 15,
"neg": 13
},
{
"date": "2023-01-03",
"pos": 3,
"neut": 6,
"neg": 5
},
{
"date": "2023-01-04",
"pos": 5,
"neut": 15,
"neg": 3
},
{
"date": "2023-01-05",
"pos": 5,
"neut": 0,
"neg": 11
},
{
"date": "2023-01-06",
"pos": 4,
"neut": 4,
"neg": 4
},
{
"date": "2023-01-07",
"pos": 13,
"neut": 13,
"neg": 12
}
]
},
{
"source": "NewYorkTimes",
"datapoints": [
{
"date": "2023-01-01",
"pos": 9,
"neut": 4,
"neg": 9
},
{
"date": "2023-01-02",
"pos": 5,
"neut": 14,
"neg": 12
},
{
"date": "2023-01-03",
"pos": 14,
"neut": 14,
"neg": 8
},
{
"date": "2023-01-04",
"pos": 6,
"neut": 0,
"neg": 14
},
{
"date": "2023-01-05",
"pos": 10,
"neut": 9,
"neg": 7
},
{
"date": "2023-01-06",
"pos": 5,
"neut": 7,
"neg": 15
},
{
"date": "2023-01-07",
"pos": 10,
"neut": 5,
"neg": 8
}
]
}
]
This is the first version of the API specification. It is now marked as deprecated because it uses the request body of GET requests to parse content on the server side, ignoring the [HTTP recommendations](https://www.rfc-editor.org/rfc/rfc2616#section-4.3).
The backend needs to provide a single endpoint for the frontend. When the user wants to update the data, a request is sent to the backend with a start and end date in the request body. The dates are given as a string and must follow the ISO-8601 date format (yyyy-mm-dd).
A request for the news data from February to August (both included) would have the following body:
{
"start": "2023-02-01",
"end": "2023-08-31"
}
The JSON object returned by the backend must have a certain structure in order for the frontend to process it.
The type expected by the frontend has been defined using Typescript:
type Response = {
totalArticles: number; // number of articles from all sources
totalCategories: number; // number of categories from all sources
sources: { // list of news sources
name: string; // name of the source (e.g. "New York Times")
articleCount: number; // number of articles from this source
articlePerc: number; // percentage of articles from this source out of all sources
categoryCount: number; // number of categories from this source
posArticles: number; // number of positive articles from this source
posArticlesPerc: number; // percentage of positive articles from this source
negArticles: number; // number of negative articles from this source
negArticlesPerc: number; // percentage of negative articles from this source
categories: { // list of labels for articles or groups to which articles belong (e.g. "Politics", "Economy"); perhaps more detailed or specific to the source (e.g. "World War II" (detailed), "r/news" (specific to Reddit))
name: string; // the name of the category (see examples above)
count: number; // how many articles or posts belong to this category in the time period specified by the request
pos: number; // number of the articles with positive sentiment
posPerc: number; // percentage of positive articles out of all articles from this source in this category
neg: number; // number of articles with negative sentiment
negPerc: number; // percentage of negative articles out of all articles from this source in this category
}[];
}[];
};
This is an example of a JSON object as it might be returned from the back end:
{
"totalArticles" : 171,
"totalCategories" : 10,
"sources": [
{
"name": "Reddit",
"articleCount": 76,
"articlePerc" : 44.44,
"categoryCount": 2,
"posArticles": 19,
"posArticlesPerc": 25,
"negArticles": 57,
"negArticlesPerc": 75,
"categories": [
{
"name": "r/politics",
"count": 31,
"pos": 2,
"posPerc": 6.45,
"neg": 29,
"negPerc": 93.55
},
{
"name": "r/news",
"count": 45,
"pos": 17,
"posPerc": 37.78,
"neg": 28,
"negPerc": 62.22
}
]},
{
"name": "New York Times",
"articleCount": 72,
"articlePerc" : 42.11,
"categoryCount": 4,
"posArticles": 28,
"posArticlesPerc": 38.89,
"negArticles": 44,
"negArticlesPerc": 61.11,
"categories": [
{
"name": "Politics",
"count": 34,
"pos": 12,
"posPerc": 35.29,
"neg": 22,
"negPerc": 64.71
},
{
"name": "Economy",
"count": 18,
"pos": 5,
"posPerc": 27.78,
"neg": 13,
"negPerc": 72.22
},
{
"name": "Technology",
"count": 12,
"pos": 7,
"posPerc": 58.33,
"neg": 5,
"negPerc": 41.67
},
{
"name": "Sports",
"count": 8,
"pos": 4,
"posPerc": 50,
"neg": 4,
"negPerc": 50
}
]},
{
"name": "Tagesschau",
"articleCount": 23,
"articlePerc" : 13.45,
"categoryCount": 4,
"posArticles": 6,
"posArticlesPerc": 6,
"negArticles": 17,
"negArticlesPerc": 94,
"categories": [
{
"name": "Military",
"count": 3,
"pos": 2,
"posPerc": 66.67,
"neg": 1,
"negPerc": 33.33
},
{
"name": "Corona",
"count": 13,
"pos": 0,
"posPerc": 0,
"neg": 13,
"negPerc": 100
},
{
"name": "Elections",
"count": 5,
"pos": 3,
"posPerc": 60,
"neg": 2,
"negPerc": 40
},
{
"name": "Football",
"count": 2,
"pos": 1,
"posPerc": 50,
"neg": 1,
"negPerc": 50
}
]}
]
}
As you can see, there is redundant data in the JSON object. For example, the total number of articles can be calculated from the number of articles per news source. In order to minimize the amount of computation in the front-end, the back-end calculates this data and provides it (partially redundantly) to the front-end, which can then simply read the data from the JSON object. Of course, the backend has the additional task of ensuring data consistency.