Policies - fidlabs/Open-Data-Pathway GitHub Wiki
Source Dataset
The source dataset must already be online - the allocator must be able to browse to and observe any part of that dataset at their discretion. The dataset must be accessible for any web viewer without gating on credentials.
Accessibility of Stored Data
- The data stored must be available via spark checking, and through manual retrieval attempts by the pathway operators
- The data must be accessible (for retrieval) from at least two distinct geographic locations
- The data must be retrievable for free for anyone that wants to access it.
Formatting of Stored Data
- There must be clear documentation provided on how the data is prepared / transformed between the source data set and the data stored in Filecoin
- Deals / Pieces stored to Filecoin should be full - there should not be significant 'slack' or 'padding' in stored data
- Individual semantic items of the data should be able to be individually referenced / accessed.
- When a piece is downloaded from the stored dataset, there must be a way to identify what part of the original dataset that piece represents and confirm that it is indeed valid data from the dataset. (this could be e.g. provided via a log file indicating the mapping of offsets / files into stored deals)
Uniqueness
Applications from a client that the pathway has not already worked with must be for a dataset that is not currently presently stored on Filecoin. If a client has demonstrated their ability to host a dataset that is retrievable and meets pathway policies, they may then be approved to store a dataset already on Filecoin if currently stored copies are not accessible or the client demonstrates other deficiencies with the existing copies.
Usage of DataCap
- There is an expiration date of three months on any allocation of DataCap. From an allocation date, we will measure three months time and if the allocation has not been used (open or closed status), the application will be closed and remaining DataCap removed.
- The expectation when the complete amount of requested DataCap is allocated is that the client has completely finished onboarding their dataset and replicas. If a client receives a DataCap allocation, then closes their application before completion, they will be questioned as to why. Likewise, if a client receives a DataCap allocation and abandons the application and becomes completely non responsive, their GitHub ID will be flagged from any future participation in the pathway.
Definitions
- Retrieval refers to being able to view and download the whole data set
- Open Data refers to data that is not encrypted