WebInterface Documentation - NewsAnalyseTool/Documentation GitHub Wiki

This is the documentation of the web interface. You can find the corresponding repository here.

Technology Stack

Frontend:

  • React (Typescript)
  • Vite

Backend:

  • Scala Play
  • sbt
  • ReactiveMongo for Scala Play

Backend

For better understanding of the application you should know the basics of the Play Framework like package structure, application properties etc. which can be found here and here.

Backend Architecture

The backend provides a simple REST-API for the aggregation of analyzed data. The Requests reach the contoller layer, which takes care of returning the right response. The logic for the aggregation is placed in the service layer of the backend which consists of a single AggregationService.scala class responsible for aggregating the needed features into a response object which is then returned to the frontend through the REST-API. One further layer is the repository which takes care of accessing the MongoDB through Reactive Mongo. Here we have defined certain queries for the databse like a query based on a date range. Furthermore this layer abstracts away the details like the number of databases or the collections which are accessed. The core of the Backend is defined in the model layer which defines the representations of the objects handled in this application, e.g. AnalyzedDocument, Source, TrendResponse.

Backend Implementation Details

Serving a React Frontend

Serving the ReactJS frontend app upon landing on the "/" path of the server was accomplished thanks to this tutorial which was slightly adjusted to our needs and defined scripts in the package.json file.

Repository Pattern

Abstracting away the details for connecting to the actual databse using a repository was actually very helpful in our case because midway through the project where the backend app was already able to connect to one database we decided to create a distinct MongoDB database on the same server for each source type. With the repository pattern we just had to include the new databases in the repository class and handle the logic there with minimal changes in the other layers.

Model

The model layer became a bit messy overtime and could be improved by separating the model in request/response entities and databse entities (e.g. DTOs - data transfer objects). Nevertheless in a backend with this size it is still quite clear but could become a problem if the API grows.

Starting the Application

For the application to run on the server we have decided to extract the sensitive information like database connection credentials in a db.conf under conf/ or just include the same information in the application.conf file. After that the application can be started with sbt run for development purposes or if you intend to run it in production mode you need to start it with sbt runProd which needs an application key defined in the application.conf file. During the startup the app bundles the frontend with the defined script from the package.json and serves it when its done.


Frontend

The frontend was developed with React in Typescript and uses the MaterialUI library for charts and some other small components. If you are wondering why almost every component has an associated .css file, even though we used the MaterialUI library, it is because it was added at the end of the project and was more the fix for another library we used to draw the diagrams before. The most components are therefor "clean" React components and import a stylesheet.

React Components

There are several components, but the most important ones are described below:

The root of all components is of course the MainComponent.tsx. It not only contains all other components, but also handles requests to the backend and states that need to be shared between components.

The SelectionRowComponent.tsx is what you see at the top of the frontend. It handles the selection of a time span and has the buttons to switch between the general data view and the trend data view.

When the general data view is selected, you will see a number of rows with the same structure in a kind of list. Each of these "entries" in the list is a NewsSourceElement.tsx component. Each of them displays the data for a particular news source. It also contains the pie chart for the distribution of categories from that news source, and the donut chart for the distribution of sentiment.

This component is repeated in a list like the NewsSourceElement component, but it is displayed only when you have selected the trend data view. It consists mainly of a graph showing the evolution of the sentiment of a news source's articles.

Quick Installation

Requirements

  • Python >= 3.11.6
  • npm 10.4.0

If you just want to get a first idea of what the project looks like, you can use a test backend, which does not need any connections to a database, so it is quite easy to set up. It was originally developed so that the frontend could be implemented without waiting for the backend to be complete. Therefore it only sends back random data, but this is enough to show all functionalities. Since it was easiest to set up a small backend with Flask, you need Python to run it.

Follow these steps to set up and run the frontend:

  1. Clone the repo as usual
  2. Run npm install in the ui folder to install all the necessary dependencies
  3. Go into the test_backend folder and create a python environment with python3 -m venv venv
  4. Activate the virtual environment with source venv/bin/activate
  5. Now, run pip install -r requirements.txt to install the nesessary dependencies
  6. The backend can be started with python app.py
  7. Start the frontend with npm run dev back in the ui folder

You should now be able to see the frontend in your browser with the url http://localhost:5173/.

Complete Installation

Inside the ui folder you will find a config.json file. Change the ip address and port to where the frontend can reach the API of the real backend. Starting the backend was already described a bit further up this page.


Request-Response Cycle

This is the definition of the API that the backend provides to the frontend. Both the endpoints and the expected responses have been defined so that both components can be developed independently. As the project evolves, this definition may be adapted to allow for additional functionality. All version of the API can be found in the sections below. Don't be surprised if the routes don't contain the version like /api/v2/.... The first version was not implemented by the real backend, only by the test backend. So the second version, which will be the final one, is in a sense the only one really used.

API v2 (Latest)

This is the second version of the API specification.

In this version, a second endpoint was defined. The first endpoint (still) responses with general data for all news sources. The new endpoint returns the trend of positive, neutral and negative articles. In order to minimize the amount of computation in the front-end, the back-end still calculates and provides (partially redundantly) data to the front-end, which can then simply read the data from the JSON object.

1. General Data Endpoint

The first endpoint provides the general data of all news sources in a specific time period.

image

The frontend will send the request to /api/data. The start end end dates must be specified with parameters in the requested url and must follow the ISO-8601 date format (yyyy-mm-dd):

http://127.0.0.1:5000/api/data?startDate=2023-01-01&endDate=2023-12-31

The start date as well as the end date are always included in the interval. To request data from a single day, both parameters must be equal.

Response JSON-Object

The JSON object returned by the backend must have a certain structure in order for the frontend to process it. (This structure is indifferent with the one specified in V1)

The type expected by the frontend has been defined using Typescript:

type Response = {
    totalArticles: number;          // number of articles from all sources
    totalCategories: number;        // number of categories from all sources

    sources: {                      // list of news sources
        name: string;               // name of the source (e.g. "New York Times")
        articleCount: number;       // number of articles from this source
        articlePerc: number;        // percentage of articles from this source out of all sources
        categoryCount: number;      // number of categories from this source
        posArticles: number;        // number of positive articles from this source
        posArticlesPerc: number;    // percentage of positive articles from this source
        neuArticles: number;        // number of neutral articles from this source
        neuArticlesPerc: number;    // percentage of neutral articles from this source
        negArticles: number;        // number of negative articles from this source
        negArticlesPerc: number;    // percentage of negative articles from this source
        categories: {               // list of labels for articles or groups to which articles belong (e.g. "Politics", "Economy"); perhaps more detailed or specific to the source (e.g. "World War II" (detailed), "r/news" (specific to Reddit))
            name: string;           // the name of the category (see examples above)
            count: number;          // how many articles or posts belong to this category in the time period specified by the request
            pos: number;            // number of the articles with positive sentiment
            posPerc: number;        // percentage of positive articles out of all articles from this source in this category
            neu: number;            // number of the articles with neutral sentiment
            neuPerc: number;        // percentage of neutral articles out of all articles from this source in this category
            neg: number;            // number of articles with negative sentiment
            negPerc: number;        // percentage of negative articles out of all articles from this source in this category
        }[];
    }[];
};

Example JSON-Object

{
  "totalArticles": 135,
  "totalCategories": 7,
  "sources": [
    {
      "name": "Reddit",
      "articleCount": 54,
      "articlePerc": 40,
      "categoryCount": 3,
      "posArticles": 20,
      "posArticlesPerc": 37.04,
      "neuArticles": 14,
      "neuArticlesPerc": 25.93,
      "negArticles": 20,
      "negArticlesPerc": 37.04,
      "categories": [
        {
          "name": "r/news",
          "count": 23,
          "pos": 14,
          "posPerc": 60.87,
          "neu": 4,
          "neuPerc": 17.39,
          "neg": 5,
          "negPerc": 21.74
        },
        {
          "name": "r/politics",
          "count": 23,
          "pos": 4,
          "posPerc": 17.39,
          "neu": 9,
          "neuPerc": 39.13,
          "neg": 10,
          "negPerc": 43.48
        },
        {
          "name": "r/worldnews",
          "count": 8,
          "pos": 2,
          "posPerc": 25,
          "neu": 1,
          "neuPerc": 12.5,
          "neg": 5,
          "negPerc": 62.5
        }
      ]
    },
    {
      "name": "New York Times",
      "articleCount": 38,
      "articlePerc": 28.15,
      "categoryCount": 2,
      "posArticles": 11,
      "posArticlesPerc": 28.95,
      "neuArticles": 10,
      "neuArticlesPerc": 26.32,
      "negArticles": 17,
      "negArticlesPerc": 44.74,
      "categories": [
        {
          "name": "Football",
          "count": 18,
          "pos": 8,
          "posPerc": 44.44,
          "neu": 2,
          "neuPerc": 11.11,
          "neg": 8,
          "negPerc": 44.44
        },
        {
          "name": "Elections",
          "count": 20,
          "pos": 3,
          "posPerc": 15,
          "neu": 8,
          "neuPerc": 40,
          "neg": 9,
          "negPerc": 45
        }
      ]
    },
    {
      "name": "Tagesschau",
      "articleCount": 43,
      "articlePerc": 31.85,
      "categoryCount": 2,
      "posArticles": 13,
      "posArticlesPerc": 30.23,
      "neuArticles": 17,
      "neuArticlesPerc": 39.53,
      "negArticles": 13,
      "negArticlesPerc": 30.23,
      "categories": [
        {
          "name": "Sport",
          "count": 11,
          "pos": 4,
          "posPerc": 36.36,
          "neu": 2,
          "neuPerc": 18.18,
          "neg": 5,
          "negPerc": 45.45
        },
        {
          "name": "Berlin",
          "count": 32,
          "pos": 9,
          "posPerc": 28.12,
          "neu": 15,
          "neuPerc": 46.88,
          "neg": 8,
          "negPerc": 25
        }
      ]
    }
  ]
}

2. Trend Data Endpoint

This endpoint should provide data to display the trends of positive, neutral and negative articles from a news source over a time span.

image

The frontend will request the route /api/trend. A possible route might look like the following:

http://127.0.0.1:5000/api/trend?startDate=2023-01-01&endDate=2023-12-31

Response JSON-Object

First of all, we define the structure of the response JSON-Object with Typescript as follows:

type Response = {
    source: string;             // The source to which the data points belong
    datapoints: {               // List of all data points that will be displayed as the trend. One data point in that list is equal to a day
        date: string;           // The date of the data point
        pos: number;            // Number of positive articles at that date
        neut: number;           // Number of neutral articles at that date
        neg: number;            // Number of negative articles at that date
    }[];
}[];

As you can see, the response is actually a list containing the trend data for all news sources. The frontend can then select the parts it wants to display.

Example JSON-Object

[
  {
    "source": "Reddit",
    "datapoints": [
      {
        "date": "2023-01-01",
        "pos": 12,
        "neut": 11,
        "neg": 11
      },
      {
        "date": "2023-01-02",
        "pos": 13,
        "neut": 9,
        "neg": 1
      },
      {
        "date": "2023-01-03",
        "pos": 15,
        "neut": 2,
        "neg": 4
      },
      {
        "date": "2023-01-04",
        "pos": 4,
        "neut": 7,
        "neg": 8
      },
      {
        "date": "2023-01-05",
        "pos": 11,
        "neut": 12,
        "neg": 15
      },
      {
        "date": "2023-01-06",
        "pos": 14,
        "neut": 0,
        "neg": 14
      },
      {
        "date": "2023-01-07",
        "pos": 2,
        "neut": 13,
        "neg": 6
      }
    ]
  },
  {
    "source": "Tagesschau",
    "datapoints": [
      {
        "date": "2023-01-01",
        "pos": 14,
        "neut": 12,
        "neg": 10
      },
      {
        "date": "2023-01-02",
        "pos": 0,
        "neut": 15,
        "neg": 13
      },
      {
        "date": "2023-01-03",
        "pos": 3,
        "neut": 6,
        "neg": 5
      },
      {
        "date": "2023-01-04",
        "pos": 5,
        "neut": 15,
        "neg": 3
      },
      {
        "date": "2023-01-05",
        "pos": 5,
        "neut": 0,
        "neg": 11
      },
      {
        "date": "2023-01-06",
        "pos": 4,
        "neut": 4,
        "neg": 4
      },
      {
        "date": "2023-01-07",
        "pos": 13,
        "neut": 13,
        "neg": 12
      }
    ]
  },
  {
    "source": "NewYorkTimes",
    "datapoints": [
      {
        "date": "2023-01-01",
        "pos": 9,
        "neut": 4,
        "neg": 9
      },
      {
        "date": "2023-01-02",
        "pos": 5,
        "neut": 14,
        "neg": 12
      },
      {
        "date": "2023-01-03",
        "pos": 14,
        "neut": 14,
        "neg": 8
      },
      {
        "date": "2023-01-04",
        "pos": 6,
        "neut": 0,
        "neg": 14
      },
      {
        "date": "2023-01-05",
        "pos": 10,
        "neut": 9,
        "neg": 7
      },
      {
        "date": "2023-01-06",
        "pos": 5,
        "neut": 7,
        "neg": 15
      },
      {
        "date": "2023-01-07",
        "pos": 10,
        "neut": 5,
        "neg": 8
      }
    ]
  }
]

API v1 (Deprecated)

This is the first version of the API specification. It is now marked as deprecated because it uses the request body of GET requests to parse content on the server side, ignoring the [HTTP recommendations](https://www.rfc-editor.org/rfc/rfc2616#section-4.3).

Endpoints

The backend needs to provide a single endpoint for the frontend. When the user wants to update the data, a request is sent to the backend with a start and end date in the request body. The dates are given as a string and must follow the ISO-8601 date format (yyyy-mm-dd).

A request for the news data from February to August (both included) would have the following body:

{
    "start": "2023-02-01",
    "end": "2023-08-31"
}

Response JSON-Object

The JSON object returned by the backend must have a certain structure in order for the frontend to process it.

The type expected by the frontend has been defined using Typescript:

type Response = {
    totalArticles: number;          // number of articles from all sources
    totalCategories: number;        // number of categories from all sources

    sources: {                      // list of news sources
        name: string;               // name of the source (e.g. "New York Times")
        articleCount: number;       // number of articles from this source
        articlePerc: number;        // percentage of articles from this source out of all sources
        categoryCount: number;      // number of categories from this source
        posArticles: number;        // number of positive articles from this source
        posArticlesPerc: number;    // percentage of positive articles from this source
        negArticles: number;        // number of negative articles from this source
        negArticlesPerc: number;    // percentage of negative articles from this source
        categories: {               // list of labels for articles or groups to which articles belong (e.g. "Politics", "Economy"); perhaps more detailed or specific to the source (e.g. "World War II" (detailed), "r/news" (specific to Reddit))
            name: string;           // the name of the category (see examples above)
            count: number;          // how many articles or posts belong to this category in the time period specified by the request
            pos: number;            // number of the articles with positive sentiment
            posPerc: number;        // percentage of positive articles out of all articles from this source in this category
            neg: number;            // number of articles with negative sentiment
            negPerc: number;        // percentage of negative articles out of all articles from this source in this category
        }[];
    }[];
};

This is an example of a JSON object as it might be returned from the back end:

{
    "totalArticles" : 171,
    "totalCategories" : 10,
    "sources": [
        {
            "name": "Reddit",
            "articleCount": 76,
            "articlePerc" : 44.44,
            "categoryCount": 2,
            "posArticles": 19,
            "posArticlesPerc": 25,
            "negArticles": 57,
            "negArticlesPerc": 75,
            "categories": [
                {
                    "name": "r/politics",
                    "count": 31,
                    "pos": 2,
                    "posPerc": 6.45,
                    "neg": 29,
                    "negPerc": 93.55

                },
                {
                    "name": "r/news",
                    "count": 45,
                    "pos": 17,
                    "posPerc": 37.78,
                    "neg": 28,
                    "negPerc": 62.22
                }
        ]},
        {
            "name": "New York Times",
            "articleCount": 72,
            "articlePerc" : 42.11,
            "categoryCount": 4,
            "posArticles": 28,
            "posArticlesPerc": 38.89,
            "negArticles": 44,
            "negArticlesPerc": 61.11,
            "categories": [
                {
                    "name": "Politics",
                    "count": 34,
                    "pos": 12,
                    "posPerc": 35.29,
                    "neg": 22,
                    "negPerc": 64.71
                },
                {
                    "name": "Economy",
                    "count": 18,
                    "pos": 5,
                    "posPerc": 27.78,
                    "neg": 13,
                    "negPerc": 72.22
                },
                {
                    "name": "Technology",
                    "count": 12,
                    "pos": 7,
                    "posPerc": 58.33,
                    "neg": 5,
                    "negPerc": 41.67
                },
                {
                    "name": "Sports",
                    "count": 8,
                    "pos": 4,
                    "posPerc": 50,
                    "neg": 4,
                    "negPerc": 50
                }
        ]},
        {
            "name": "Tagesschau",
            "articleCount": 23,
            "articlePerc" : 13.45,
            "categoryCount": 4,
            "posArticles": 6,
            "posArticlesPerc": 6,
            "negArticles": 17,
            "negArticlesPerc": 94,
            "categories": [
                {
                    "name": "Military", 
                    "count": 3,
                    "pos": 2,
                    "posPerc": 66.67,
                    "neg": 1,
                    "negPerc": 33.33
                },
                {
                    "name": "Corona", 
                    "count": 13,
                    "pos": 0,
                    "posPerc": 0,
                    "neg": 13,
                    "negPerc": 100
                },
                {
                    "name": "Elections", 
                    "count": 5,
                    "pos": 3,
                    "posPerc": 60,
                    "neg": 2,
                    "negPerc": 40
                },
                {
                    "name": "Football", 
                    "count":  2,
                    "pos":  1,
                    "posPerc": 50,
                    "neg": 1,
                    "negPerc": 50
                }
        ]}
    ]
}

As you can see, there is redundant data in the JSON object. For example, the total number of articles can be calculated from the number of articles per news source. In order to minimize the amount of computation in the front-end, the back-end calculates this data and provides it (partially redundantly) to the front-end, which can then simply read the data from the JSON object. Of course, the backend has the additional task of ensuring data consistency.

⚠️ **GitHub.com Fallback** ⚠️