Streaming Availability & Content Pulse - BB-Media-IT/Data-Hub GitHub Wiki
This document details the schemas used to manage the data for the Streaming Availability product. During the delivery process, a specific S3 Bucket is generated for each client. In each of these buckets, we have three main folders that contain
JSONL format files. These folders are organized as follows: Contents, Episodes, Stats and Platforms. Additionally, within each of these main folders, there is a subfolder called latest, where the latest snapshot from each streaming platform surveyed and
contracted by the client is stored.
Document Structure
This document is organized into the following sections:
Each client receives an S3 Bucket where data is organized into specific folders to ensure efficient management and quick access. The folder names reflect the types of data they contain, facilitating the identification and specific processing of each data set:
Example
s3://bucket-client/Contents
s3://bucket-client/Episodes
s3://bucket-client/Stats
s3://bucket-client/Platforms
Each of these folders contains a latest subfolder, which is periodically updated with the latest available snapshot, ensuring that clients always have access to the most recent information.
Update frequency and scope
The update of the data in S3 Bucket is defined according to the needs of each customer.
The scope is defined according to the needs of each customer.
🚀 You can get a weekly updated demo by connecting to the following Bucket s3://bb-media-data/streaming-availability/ using AWS CLI or any software and the endpoint parameter --endpoint https://nyc3.digitaloceanspaces.com.
We provide a detailed description of the files contained in the Contents, Episodes, Stats and Platforms folders, explaining the structure and type of data each one manages. If you want to see the schemas in YAML, click here.
Contents
Field
Type
Description
Example
PlatformId
integer
ID for the platform.
266
PlatformCode
string
Code identifying the platform and the territory.
us.disneyplus
PlatformName
string
Official name of the platform.
Disney+
PlatformCountry
string
Country ISO 3166-1 alpha-2 code.
US
HashUnique
string
Hash identifying the movie or series in a specific platform and territory.
ed3f48fd23236363df52e65b32c82e00
UID
string
Hash identifying the movie or series universally.
a646fad6731af909f1a7d4309c2cd069
Id
string
Identifier of the movie or series used in the platform itself.
498649d2-a45d-4d77-910c-f5fe1e837a90
OtherIds
array
Only for Amazon platforms. The ASIN, GTI and CompactGTI IDs, when available.
Indicates whether the content is new in this platform and region.
truefalse
New
FirstDetection
string
Indicates the time the content was detected for the first time in the platform.
2024-04-18T00:00:00Z
Episodes
Field
Type
Description
Example
PlatformId
integer
ID for the platform. The ID refers to the platform across all available territories.
34
PlatformCode
string
Code identifying the platform and the territory.
ar.amazonprimevideo
PlatformName
string
Official name of the platform.
Amazon Prime Video
PlatformCountry
string
Country ISO 3166-1 alpha-2 code.
AR
Season
integer
Season number.
1
Episode
integer
Episode number.
8
HashUnique
string
Hash identifying specific platform and territory.
cd33672d85f6eafb3e0f1b665882424a
Id
string
Identifier of the episode used by the platform itself. If no ID is found in the platform, it may be the slug or a hash generated during the scraping of the data.
amzn1.dv.gti.8276269a-402e-4ece-a2b0-4eb5e2504a05
OtherIds
array
Only for Amazon platforms. The ASIN, GTI and CompactGTI IDs, when available.
Identifier of the parent series used by the platform itself. If no ID is found in the platform, it may be the slug or a hash generated during the scraping of the data.
0HAQAA7JM43QWX0H6GUD3IOF70
ParentTitle
string
The title of the parent series as it is found in the platform.
Fallout
ParentHashUnique
string
Hash identifying the parent series in a specific platform and territory.
cd7563ce07c1e2b4aa846d244acf0861
ParentUID
string
Hash identifying the parent series universally.
07a38fa2fa19b134ca248295e9976752
Title
string
The episode title as it is found in the platform. If none is found, one is created using the 'Episode #' format.
El comienzo
OriginalTitle
string
Title in the original language.
null
Type
string
Type of the content.
Episode
Year
integer
Year of the episode.
2024
Duration
integer
Runtime in minutes.
62
ExternalIds
array
IDs from external databases that are mapped to the episode.
Image URLs that point to the image assets found in the platform belonging to the episode. Not classified. They can be posters, stills or promotional pictures.
Rating
string
Age rating information as found in the platform.
16+
Provider
array
Producer companies, if indicated by the platform.
["Amazon Studios"]
Genres
array
Genres in English, parsed from the genres provided by the platform.
We include tables to clearly visualize the relationships and key fields in each JSONL file and analyze how the various files interrelate to provide a complete view of the overall Streaming Availability product data model.
Description of the parquet file fields
If you would like to receive these files, please contact us.
Field
Type
Description
Example
BB UID
string
Hash identifying the movie or series universally.
a646fad6731af909f1a7d4309c2cd069
SQL Unique
string
Hash identifying the movie or series by packages.
a646fad6731af909f1a7d4309c2cd069
Platform Content ID
string
Identifier of the movie or series used in the platform itself.
498649d2-a45d-4d77-910c-f5fe1e837a90
BB Hash Unique
string
Hash identifying the movie or series in a specific platform and territory.
ed3f48fd23236363df52e65b32c82e00
Platform Name
string
Official name of the platform.
Disney+
Platform Country
string
Country ISO 3166-1 alpha-2 code.
US
Package
string
Business models available for the movie or series.
Subscription VOD
Platform Title
string
The title as it is found in the platform.
Ant-Man and the Wasp: Quantumania
Type
string
The content type as it is assigned by the platform.
Movie, Tv Show
Deeplink
string
URLs pointing to the movie or series in the platform.