This document details the schemas used to manage the data for the Livestreaming & Fast Channels
product. During the delivery process, a specific S3 Bucket is generated for each client. In each of these buckets, we have three main folders that contain
JSONL format files. These folders are organized as follows: channels
, contents
, schedule
, events
and sports
. Additionally, within each of these main folders, there is a subfolder called latest
, where the latest snapshot from each streaming platform surveyed and
contracted by the client is stored.
Document Structure
This document is organized into the following sections:
Details of the S3 Buckets
Each client receives an S3 Bucket where data is organized into specific folders to ensure efficient management and quick access. The folder names reflect the types of data they contain, facilitating the identification and specific processing of each data set:
Example
s3://bucket-client/LiveStreaming/channels
s3://bucket-client/LiveStreaming/contents
s3://bucket-client/LiveStreaming/schedule
s3://bucket-client/LiveStreaming/events
s3://bucket-client/LiveStreaming/sports
This folder contains a latest
subfolder, which is periodically updated with the latest available snapshot, ensuring that customers always have access to the latest information.
Update frequency and scope
- The update of the data in S3 Bucket is defined according to the needs of each customer.
- The scope is defined according to the needs of each customer.
🚀 You can get a weekly updated demo by connecting to the following Bucket s3://bb-media-data/live-streaming/
using AWS CLI or any software and the endpoint parameter --endpoint https://nyc3.digitaloceanspaces.com
.
Command example aws s3 --endpoint https://nyc3.digitaloceanspaces.com cp s3://bb-media-data/live-streaming/ /Demo/BB-Media --recursive
File Description
We provide a detailed description of the files contained in the Sources
folder, explaining the structure and type of data handled by each one. If you wish to see the schemas in YAML, click here.
channels
Field |
Type |
Description |
Example |
PlatformNumberId |
integer |
ID for the platform. The ID refers to the platform across all available territories. |
755 |
PlatformCountry |
string |
Country ISO 3166-1 alpha-2 code. |
US |
PlatformCode |
string |
Code identifying the platform and the territory. |
us.plutotv |
CreatedAt |
string |
Date when the data was scraped. |
2023-09-24 |
Timestamp |
date |
Timestamp when the scraping started. |
2023-09-24T15:27:37.238000 |
Id |
string |
Channel ID either brought from the platform or hashed. |
569546031a619b8f07ce6e25 |
Title |
string |
Channel title. |
Pluto TV Horror |
Type |
string |
Channel type. Value assigned to the channel (not scraped). |
TV |
Deeplink |
string |
Channel deeplink. |
https://pluto.tv/es/live-tv/pluto-tv-horror |
Broadcast |
string |
MRL link if found. |
https://example.com/live/index.m3u8 |
Number |
integer |
Channel number if found. |
96 |
Images |
array |
Channel images. |
View More In Images |
Genres |
array |
Channel genres. |
["News","Sports"] |
Subtitles |
array |
Subtitles available. |
["English","Spanish"] |
Dubbed |
array |
Spoken languages available. |
["English","Spanish"] |
FullDay |
boolean |
Channel has no schedule defined but has 24 hours transmission. |
false true |
OnlyEvents |
boolean |
Channel has no schedule defined but has events related to it. |
false true |
IsOriginal |
boolean |
Platform original content. |
false true |
IsAdult |
boolean |
Adult content or with age restriction. |
false true |
IsKids |
boolean |
Kids content. |
false true |
IsFast |
boolean |
Provides FAST content. Value assigned to the channel (not scraped). |
false true |
FastBy |
string |
FAST type. Value assigned to the channel (not scraped). |
By Content |
Owner |
string |
Channel owner. Value assigned to the channel (not scraped). |
Paramount Global |
Packages |
array |
Channel revenue model. |
View More In Packages |
GMT |
integer |
GMT timezone offset. |
-4 |
IsActive |
boolean |
Indicates whether the channel was found active during the scraping. |
false true |
Order |
integer |
Which order the channel had in the webpage's grid in relation to the other channels. |
10 |
Images
Field |
Type |
Description |
Example |
Type |
string |
Image type. |
Logo |
Url |
string |
Direct URL for the image. |
|
Packages
Field |
Type |
Description |
Example |
Type |
string |
Type of business model (transaction-vod, subscription-vod, validated-vod, tv-everywhere, free-vod). |
free-vod |
contents
Field |
Type |
Description |
Example |
PlatformNumberId |
integer |
ID for the platform. The ID refers to the platform across all available territories. |
755 |
PlatformCountry |
string |
Country ISO 3166-1 alpha-2 code. |
US |
PlatformCode |
string |
Code identifying the platform and the territory. |
us.plutotv |
CreatedAt |
string |
Date when the data was scraped. |
2023-08-28 |
Timestamp |
date |
Timestamp when the scraping started. |
2023-08-28T04:19:39.052000 |
Id |
string |
Unique identifier for the content. |
58c058af77914d67bfdbdec9 |
Title |
string |
Title of the content. |
Texas Chainsaw |
Type |
string |
Type of the content. undefined: Type cannot be determined. placeholder: No content data available. One of: movie, episode, undefined, placeholder |
movie |
Deeplink |
string |
Deeplink for the content. |
https://pluto.tv/es/live-tv/pluto-tv-terror/details/texas-chainsaw-massacre-3d-1-1 |
Year |
integer |
Year of the content. |
1970 |
Synopsis |
string |
Synopsis of the content. |
A young woman inherits an isolated mansion... |
Images |
array |
Images associated with the content. |
|
Duration |
integer |
Duration of the content in minutes. |
120 |
Genres |
array |
Genres associated with the content. |
["Horror"] |
Rating |
string |
Content rating. |
R |
OnDemand |
boolean |
Whether the content is available on-demand. |
false true null |
IsAdult |
boolean |
Whether the content is adult-oriented. |
false true null |
IsKids |
boolean |
Whether the content is kids-oriented. |
false true null |
SerieTitle |
string |
Title of the series if the content is part of a series. |
null |
Season |
integer |
Season number if the content is part of a series. |
null |
Episode |
integer |
Episode number if the content is part of a series. |
null |
Packages |
array |
Revenue models assiciated with the content. |
View More In Packages |
schedule
Field |
Type |
Description |
Example |
PlatformNumberId |
integer |
ID for the platform. The ID refers to the platform across all available territories. |
755 |
PlatformCountry |
string |
Country ISO 3166-1 alpha-2 code. |
US |
PlatformCode |
string |
Code identifying the platform and the territory. |
us.plutotv |
CreatedAt |
string |
Date when the data was scraped. |
2024-04-13 |
Timestamp |
date |
Timestamp when the scraping started. |
2024-04-13T08:18:00.484000 |
Id |
string |
Unique identifier for the payload schedule. |
a84aeb8ff4e696fbfab88756e7fbf6ec |
ChannelId |
string |
Unique identifier for the channel associated with the payload. |
5dc481cda1d430000948a1b4 |
ChannelTitle |
string |
Title of the channel. |
CBS News Los Angeles |
ContentId |
string |
Unique identifier for the content associated with the payload. |
63b67c9bbc65210013550c75 |
ContentTitle |
string |
Title of Content |
KCAL News 5pm |
ContentMetadata |
bool |
This Content payload exists in the contents collection if it has more metadata than just ContentId and ContentTitle. |
false true |
Type |
string |
Type of the payload. placeholder: No schedule data available. |
schedule |
Subtitles |
array |
Subtitles available for the content. |
["English","Spanish"] |
Dubbed |
array |
Spoken languages available for the content. |
["English","Spanish"] |
ClosedCaption |
boolean |
Whether closed captions are available. |
false true null |
Accessibility |
boolean |
Has accesibility options. |
false true null |
IsFast |
boolean |
Whether the schedule's channel is FAST. Value assigned to the channel (not scraped). |
false true null |
StartDate |
date |
Start time of the payload schedule. |
2024-04-13T20:00:00 |
EndDate |
date |
End time of the payload schedule. |
2024-04-13T21:00:00 |
Duration |
integer |
Duration of the content in minutes. |
60 |
Seconds |
integer |
Duration of the content in seconds. |
3600 |
GMT |
integer |
GMT timezone offset. |
-4 |
events
Field |
Type |
Description |
Example |
PlatformNumberId |
integer |
ID for the platform. The ID refers to the platform across all available territories. |
736 |
PlatformCountry |
string |
Country ISO 3166-1 alpha-2 code. |
US |
PlatformCode |
string |
Code identifying the platform and the territory. |
us.peacock |
CreatedAt |
string |
Date when the data was scraped. |
2024-04-13 |
Timestamp |
date |
Timestamp when the scraping started. |
2024-04-13T08:18:00.484000 |
Id |
string |
Unique identifier for the payload event. |
841bed64-28a5-3634-be8a-fd30ec77b6ec |
Title |
string |
Title of the Event |
Countdown to LA28 |
Type |
string |
Type of the event payload. Can be sport , concert , program or unknown . |
sport |
Deeplink |
string |
Deeplink for the event. |
https://www.peacocktv.com/watch/asset/sports/la-2028-handover/8400ffd1-a569-3737-86a6-acccd061b32f |
Broadcast |
string |
MRL link if found. |
https://example.com/live/index.m3u8 |
Synopsis |
string |
Synopsis of the event. |
NBC's coverage of the Autodesk Countdown to LA28 Presented by... |
Image |
array |
Images associated with the content. |
|
Season |
string |
Name of the season in which the event takes place. |
UEFA Euro 2024 |
ParentId |
string |
Id of the channel or other event which parents this event. |
65d64e571b00b5e58fb6bdbf |
ParentTitle |
string |
Name of the channel or other event which parents this event. |
WWE Raw |
Genre |
string |
Genre associated with the event. |
["Ceremony"] |
Language |
array |
List of languages in which the event can be played. |
["English", "Spanish"] |
Crew |
array |
List of the people and its roles associated with the event. |
[{"Role":"Singer", "Name":"James Hetfield"},{"Role":"Drummer", "Name":"Lars Ulrich"}] |
Date |
date |
Start time of the payload event. |
2024-08-08T20:00:00 |
Duration |
integer |
Duration of the event in minutes. |
120 |
Venue |
string |
Venue in which the event will take place. |
Estádio do Maracanã |
Location |
string |
Country or city in which the event will take place. |
Texas |
Provider |
string |
Who distributes this event. |
ESPN+ |
GMT |
integer |
GMT timezone offset. |
-4 |
Packages |
array |
Revenue models assiciated with the event. |
View More In Packages |
sports
We are moving forward in our live sporting events project. We are currently adapting our schemas to align with schema.org standards, ensuring easy integration for clients. Focusing on major sports and leagues, we plan to release the initial version of the schema in May 2024
. We will make every effort to ensure that this version will also include additional data, such as lineups and detailed information about teams and players.
Field |
Type |
Description |
Example |
@context |
String |
The schema that defines the data context, typically a URL pointing to a definition like Schema.org. |
http://schema.org |
@type |
String |
The type of this entity, here it is a BroadcastEvent . |
BroadcastEvent |
event_id |
String |
Unique event HashId |
5e54254cbc74b8562a16619622d3be41333383e128d72c7a24dab9e9a3e57db0 |
isLiveBroadcast |
Boolean |
Indicates if the event is a live broadcast. |
false true |
eventAttendanceMode |
String |
Mode of attendance, here specified as OnlineEventAttendanceMode indicating an online event. |
OnlineEventAttendanceMode |
broadcastOfEvent |
Object |
An embedded object that describes the sports event being broadcast. Details follow in nested dictionary. |
View More In broadcastOfEvent |
Nested within broadcastOfEvent
Field |
Type |
Description |
Example |
@type |
String |
Specifies the type |
SportsEvent |
sport |
String |
The type of sport. |
Baseball |
organizer |
Object |
Object detailing the organizer. |
View More In organizer |
eventStatus |
String |
The status of the event. |
EventScheduled |
name |
String |
Name of the sports event, same as the broadcast event. |
Seattle Mariners vs. Texas Rangers |
description |
String |
Description of the sports event, same as the broadcast event. |
Seattle Mariners vs. Texas Rangers MLB 2024 |
awayTeam |
Object |
Details about the away team, specified further in nested dictionary. |
View More In awayTeam |
homeTeam |
Object |
Details about the home team, specified further in nested dictionary. |
View More In awayTeam |
startDate |
DateTime |
Start date and time for the sports event, matches the broadcast event. |
2024-04-24T00:05:00Z |
location |
Object |
The location of the event, described further in nested dictionary. |
View More In location |
offers |
Object |
Offers related to the event, including pricing and URLs, described further in nested dictionary. |
View More In offers |
audience |
Object |
The intended audience for the event. |
Sports Fans |
Nested within organizer
Field |
Type |
Description |
Example |
@type |
String |
Type of the entity |
Organization |
name |
String |
Name of the Organizer |
MLB |
year |
Integer |
Year of the Competition |
2024 |
description |
String |
Description of the Competion |
2024 World Series |
logo |
String |
Logo of the Organizer |
|
Nested within awayTeam
and homeTeam
Field |
Type |
Description |
Example |
@type |
String |
Type of the entity |
SportsTeam |
name |
Array |
Name of the team or players |
Seattle Mariners |
sport |
String |
Sport played by the team |
Baseball |
logo |
String |
Logo the team or player |
|
Nested within location
Field |
Type |
Description |
Example |
@type |
String |
Type. |
SportsActivityLocation |
name |
String |
The name of the location |
Globe Life Field |
Nested within offers
Field |
Type |
Description |
Example |
@type |
String |
Type. |
AggregateOffer |
highPrice |
Number |
The highest price of the offers, if available. |
null |
lowPrice |
Number |
The lowest price of the offers, if available. |
null |
offerCount |
Number |
The count of distinct offers available. |
2 |
offers |
Array |
Array of individual offers, each described further. |
View More In offers |
Nested within individual offers
in offers
Table Visualization & Data Relationships
We include tables to clearly visualize the relationships and key fields in each JSONL file and analyze how the various files interrelate to provide a complete view of the overall Livestreaming & Fast Channels
product data model.
