Online Audits - BB-Media-IT/Data-Hub GitHub Wiki
This document details the schemas used to manage the Online Audits
product data. During the delivery process, a specific S3 Bucket is created for each client. In each of these buckets, we have a main folder containing files in JSONL format. This folder is named OnlineAudits
. Additionally, within this folder, there is a subfolder named latest
.
Document Structure
This document is organized into the following sections:
Details of the S3 Buckets
Each client receives an S3 Bucket where data is organized into specific folders to ensure efficient management and quick access. The folder contains 2 main files with platforms data and scraping data:
Example
s3://bucket-client/OnlineAudits/latest/piracy_platforms_data.jsonl
s3://bucket-client/OnlineAudits/latest/piracy_scraping.jsonl
This folder contains a latest
subfolder, which is periodically updated with the latest available snapshot, ensuring that customers always have access to the latest information.
Update frequency and scope
- Data in the S3 Bucket is updated monthly.
File Description
We provide a detailed description of the files contained in the Online audits
folder, explaining the structure and type of data handled by each one. If you wish to see the schemas in YAML, click here.
Piracy Platforms Data
Field | Type | Description | Example |
---|---|---|---|
Date |
string | Data creation date, YYYY-MM-DD | 2025-06-01 |
Platform |
string | Platform commercial name | cuevana |
PlatformCode |
string | Platform code | cuevanacom |
Url |
string | Platform URL | https://cuevanacom.com |
AccessMethod |
string | How data is available | VOD |
ContentType |
string | Available content types (Movies/Series/Anime) | Movie |
OriginCountry |
string | Platform origin country | RU |
Domain |
string | Platform domain | cuevana.com |
GlobalRanking |
integer | Web global ranking by traffic | 1523 |
CountryTrafficPenetration |
array(dict) | Country-level web traffic penetration percentage (max 5 countries) | [{Country:Brazil, Percentage:18.4}, {Country:Argentina, Percentage:14.7}, {Country:Mexico, Percentage:10.2}] |
TotalVisits |
integer | Visits in the last month | 12500000 |
Dns |
string | DNS info | cosmin.ns.cloudflare.com, tani.ns.cloudflare.com |
Ip |
string | IP | 111.11.11.111, 222.22.222.222 |
ServerLocation |
string | Server location country | US |
HostServers |
string | Host | ServerHostName, Inc. |
CountryRanking |
array(dict) | Country-level web traffic ranking | [{Country:Brazil, Rank:1}, {Country:Argentina, Rank:2}, {Country:Mexico, Rank:3}] |
Piracy Platforms Scraping
Field | Type | Description | Example |
---|---|---|---|
Date |
string | Data creation date, YYYY-MM-DD | 2025-06-01 |
PlatformCode |
string | Platform code | cuevanacom |
Permalink |
string | Title URL in platform | https://cuevana.com/movie/avengers-endgame |
UID |
string | Unique identifier | 5f4d9e2a3ac0479d8b6c |
Title |
string | Title | Avengers: Endgame |
OriginalTitle |
string | Original title | Avengers: Endgame |
ContentType |
string | Content type of the title Movie or Series |
Movie |
Year |
integer | Release year | 2019 |
IMDbId |
string | IMDB Id identifier | tt4154796 |
Genres |
string | Associated Genres | Action, Sci-Fi |
ReleaseDate |
string | Title release date, YYYY-MM-DD | 2019-04-26 |
PrimaryCountry |
string | Primary production country | US |