Lake Formation - YakDriver/fardvag GitHub Wiki
NOTE: AWS consistently uses "Lake<space>Formation" rather than "LakeFormation".
- AWS SDK Go Lake Formation
- AWS CLI Lake Formation
- AWS Lake Formation Documentation
- AWS Lake Formation Developer Guide
- AWS Main Landing Page
- Update AWS Glue Data Permissions to the AWS Lake Formation Model (incompatibility between existing Glue data permissions and Lake Formation model)
- Lake Formation available on GovCloud
- Lake Formation announcement
- Lake Formation blog post
- [VIDEO] IAMAllowedPrincipals
- Related: AWS SDK Go Glue
- Related: AWS SDK Go IAM
- Feature Request Issue #9700
- Lake Formation Service Client #9701
- Lake Formation DataLake settings #13250
- Lake Formation resource #13267
- Lake Formation permissions #13396
- Data Lake = S3 path
- Data Catalog
- Blueprint
- Workflow
- Write Permission on Data Catalog, S3
- Manage Permission on Data Catalog, S3
- Athena to query data
- Redshift Spectrum to query data
The following are the general steps to create and use a data lake:
- Register an Amazon Simple Storage Service (Amazon S3) path as a data lake.
- Grant Lake Formation permissions to write to the Data Catalog and to Amazon S3 locations in the data lake.
- Create a database to organize the metadata tables in the Data Catalog.
- Use a blueprint to create a workflow. Run the workflow to ingest data from a data source.
- Set up your Lake Formation permissions to allow others to manage data in the Data Catalog and the data lake.
- Set up Amazon Athena to query the data that you imported into your Amazon S3 data lake.
- For some data store types, set up Amazon Redshift Spectrum to query the data that you imported into your Amazon S3 data lake.
AWS Service | How Integrated |
---|---|
AWS Glue | AWS Glue and Lake Formation share the same Data Catalog. For console operations (such as viewing a list of tables) and all API operations, AWS Glue users can access only the databases and tables on which they have Lake Formation permissions. AWS Glue does not support Lake Formation column permissions. |
Amazon Athena | When Amazon Athena users select the AWS Glue catalog in the query editor, they can query only the databases, tables, and columns that they have Lake Formation permissions on. Queries using manifests are not supported. In addition to principals who authenticate with Athena through AWS Identity and Access Management (IAM), Lake Formation supports Athena users who connect through the JDBC or ODBC driver and authenticate through SAML. Supported SAML providers include Okta and Microsoft Active Directory Federation Service (AD FS). For more information, see Using Lake Formation and the Athena JDBC and ODBC Drivers for Federated Access to Athena in the Amazon Athena User Guide. |
Amazon Redshift Spectrum | When Amazon Redshift users create an external schema on a database in the AWS Glue catalog, they can query only the tables and columns in that schema on which they have Lake Formation permissions. Queries using manifests are not supported. |
Amazon EMR | Lake Formation permissions are enforced when Apache Spark applications are submitted using Apache Zeppelin or EMR Notebooks. |
Amazon QuickSight Enterprise Edition | When an Amazon QuickSight Enterprise Edition user queries a dataset in an Amazon S3 location that is registered with Lake Formation, the user must have the Lake Formation SELECT permission on the data. |
AWS Glue DataBrew | To create a dataset for an Amazon S3 location that is registered with Lake Formation, the DataBrew principal must have the DESCRIBE Lake Formation permission on the corresponding AWS Glue Data Catalog table. To access data in the dataset, the DataBrew principal must have the SELECT Lake Formation permission on the table. |
AWS KMS | Lake Formation also works with AWS Key Management Service (AWS KMS) to enable you to more easily set up these integrated services to encrypt and decrypt data in Amazon Simple Storage Service (Amazon S3) locations. |