Hits - BB-Media-IT/Data-Hub GitHub Wiki

This document details the schemas used to manage the Hits product data. During the delivery process, a specific S3 Bucket is created for each client. In each of these buckets, we have a main folder containing files in JSONL format. This folder is named Hits. Additionally, within this folder, there is a subfolder named latest.

Document Structure

This document is organized into the following sections:

Details of the S3 Buckets

Each client receives an S3 Bucket where data is organized into specific folders to ensure efficient management and quick access. The folder names reflect the types of data they contain, facilitating the identification and specific processing of each data set:

Example:

  • s3://bucket-client/Hits

This folder contains a latest subfolder, which is periodically updated with the latest available snapshot, ensuring that customers always have access to the latest information.

Update frequency and scope

  • The update of data in S3 Bucket is weekly.

🚀 You can get a weekly updated demo by connecting to the following Bucket s3://bb-media-data/hits/ using AWS CLI or any software and the endpoint parameter --endpoint https://nyc3.digitaloceanspaces.com.

Command example aws s3 --endpoint https://nyc3.digitaloceanspaces.com cp s3://bb-media-data/hits/ /Demo/BB-Media --recursive

File Description

We provide a detailed description of the files contained in the Hits folder, explaining the structure and type of data handled by each one. If you wish to see the schemas in YAML, click here.

Hits

Field Type Description Example
UID string Hash identifying the movie or series universally c4d8bb0055f0cf866ec6f5d16b5471c5
IMDBId string ID coming from IMDB tt13016388
TMDBId string ID coming from TMDB 108545
DateHits string First day of the analyzed week 2024-04-01
WeekOn integer Week number of the year corresponding to the date the data is processed (1 to 52/53) 14
YearOn integer Year corresponding to the date the data is processed 2024
Title string Titles coming from IMDB 3 Body Problem
Year integer Year when the movie/series was released, coming from IMDB 2024
Country string Country where scores are calculated (ISO Alpha2 Code) US
Type string Content type Movie
Genre string Title primary genre, most common genre in collaborative databases Action
Scores array None View More In Scores
Position integer Title position inside current country and content type 1
DeltaPositionInt integer Position change from previous week to current, if title wasn't in previous week data, is null 3
Average numeric Hits score average inside corresponding Country and Type 1.55
HitsRelative numeric (Hits score / Average)-1 , compare Hits score over Average, "how many times is Hits score over Average" 63.66
HitsLocal numeric Compare Hits score to the Hits score of the same title in its origin country. 100
ReleaseDate string Title release date in corresponding country 2022-08-02
HitsCategory string Category assigned to a title according to its Hits score Unicorn
TrendScore numeric Relative score based on the difference between the moving averages of the last week and the previous 2 weeks, divided by the latter moving average 1.888
TrendCategory string Category assigned to a title according to its 'trend_score' score Average Trend

Scores

Field Type Description Example
Score numeric - 100
Source string Raw scores have no limits, others are standarized between 0-100. All scores are calculated in specific week. Piracy: The number of views and downloads of title offered on pirate websites Twitter: The number of published tweets (of public profiles with the users' location to determine the country) that include hashtags with the title of the content Imdb: The number of votes registered on IMDB Cdb: The number of views from collaborative databases (some websites are filmaffinity, letterboxd, sensacine) Youtube: The number of weekly views of the first 10 videos related to the title Hits: The weighted average of the different sources previously mentioned (20% Piracy + 15% Twitter + 10% Search Engine + 25% Collaborative DB Votes + 20% Collaborative DB Popularity + 10% Youtube) YoutubeRaw