Online Audits - BB-Media-IT/Data-Hub GitHub Wiki

This document details the schemas used to manage the Online Audits product data. During the delivery process, a specific S3 Bucket is created for each client. In each of these buckets, we have a main folder containing files in JSONL format. This folder is named OnlineAudits. Additionally, within this folder, there is a subfolder named latest.

Document Structure

This document is organized into the following sections:

Details of the S3 Buckets

Each client receives an S3 Bucket where data is organized into specific folders to ensure efficient management and quick access. The folder contains 2 main files with platforms data and scraping data:

Example

  • s3://bucket-client/OnlineAudits/latest/piracy_platforms_data.jsonl
  • s3://bucket-client/OnlineAudits/latest/piracy_scraping.jsonl

This folder contains a latest subfolder, which is periodically updated with the latest available snapshot, ensuring that customers always have access to the latest information.

Update frequency and scope

  • Data in the S3 Bucket is updated monthly.

File Description

We provide a detailed description of the files contained in the Online audits folder, explaining the structure and type of data handled by each one. If you wish to see the schemas in YAML, click here.

Piracy Platforms Data

Field Type Description Example
Date string Data creation date, YYYY-MM-DD 2025-06-01
Platform string Platform commercial name cuevana
PlatformCode string Platform code cuevanacom
Url string Platform URL https://cuevanacom.com
AccessMethod string How data is available VOD
ContentType string Available content types (Movies/Series/Anime) Movie
OriginCountry string Platform origin country RU
Domain string Platform domain cuevana.com
GlobalRanking integer Web global ranking by traffic 1523
CountryTrafficPenetration array(dict) Country-level web traffic penetration percentage (max 5 countries) [{Country:Brazil, Percentage:18.4}, {Country:Argentina, Percentage:14.7}, {Country:Mexico, Percentage:10.2}]
TotalVisits integer Visits in the last month 12500000
Dns string DNS info cosmin.ns.cloudflare.com, tani.ns.cloudflare.com
Ip string IP 111.11.11.111, 222.22.222.222
ServerLocation string Server location country US
HostServers string Host ServerHostName, Inc.
CountryRanking array(dict) Country-level web traffic ranking [{Country:Brazil, Rank:1}, {Country:Argentina, Rank:2}, {Country:Mexico, Rank:3}]

Piracy Platforms Scraping

Field Type Description Example
Date string Data creation date, YYYY-MM-DD 2025-06-01
PlatformCode string Platform code cuevanacom
Permalink string Title URL in platform https://cuevana.com/movie/avengers-endgame
UID string Unique identifier 5f4d9e2a3ac0479d8b6c
Title string Title Avengers: Endgame
OriginalTitle string Original title Avengers: Endgame
ContentType string Content type of the title Movie or Series Movie
Year integer Release year 2019
IMDbId string IMDB Id identifier tt4154796
Genres string Associated Genres Action, Sci-Fi
ReleaseDate string Title release date, YYYY-MM-DD 2019-04-26
PrimaryCountry string Primary production country US